You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:
- Extract Excel table from .doc file, which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?
Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:
new bolding indicates a new table i.e. a new CSV file.
Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).
Target files with their columns by Rule
- Assisstants.csv - Name, Date, Location, Time
- Other.Assistants.csv - Name, Date, Location, Time
- General.csv - Event, Date, Location, Time
Fig. 1 Example of Excel Table in .doc file
OS: Linux Debian Stretch 9 and others
Data: .odt file here
