Skip to main content
added 113 characters in body
Source Link

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

  1. Extract Excel table from .doc file., which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?

  2. Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

    new bolding indicates a new table i.e. a new CSV file.

  3. Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

  1. Assisstants.csv - Name, Date, Location, Time
  2. Other.Assistants.csv - Name, Date, Location, Time
  3. General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

enter image description here

OS: Linux Debian Stretch 9 and others
Data: .odt file here

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

  1. Extract Excel table from .doc file.

  2. Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

    new bolding indicates a new table i.e. a new CSV file.

  3. Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

  1. Assisstants.csv - Name, Date, Location, Time
  2. Other.Assistants.csv - Name, Date, Location, Time
  3. General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

enter image description here

OS: Linux Debian Stretch 9 and others
Data: .odt file here

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

  1. Extract Excel table from .doc file, which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?

  2. Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

    new bolding indicates a new table i.e. a new CSV file.

  3. Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

  1. Assisstants.csv - Name, Date, Location, Time
  2. Other.Assistants.csv - Name, Date, Location, Time
  3. General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

enter image description here

OS: Linux Debian Stretch 9 and others
Data: .odt file here

clearer
Source Link

How to convert varying 3x3 Spreadsheetsplit Excel table into many Tall arrays by SED/AWK/Perl/Zsh/..CSV files in .doc by Bold text?

IYou have horrible formatted data stored in777 .doc files where each .doc file contains a big Excel sheets of Doc documenttable, like one here and in Fig. 1. Here, which I am thinkingonly consider one .doc file. I want to parsedivide the Excel table of .doc file into many tall arrays such I can do data analysis on some cells systematicallyCSV files by any Unix programming language and/or scripting. StepsI cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

  1. Data in theExtract Excel tables in .doc file: extraction of excel table from .doc file,.

  2. Split of Excel table (maybe convert here already to CSV) into 3 separate tables as .CSV files, and by Rule:

    new bolding indicates a new table i.e. a new CSV file.

  3. ApplicationApply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of your proposal:the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

    • Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:Target files with their columns by Rule

  • by location: bottom and top

  • by time: morning, evening and night

  • by workers: assistants and other assistants

  • by general description of events: General which then again grouped by time: morning, evening and night. Examples of data expanded into normalised tables

      location    | general | time 
      bottom      | Mainly peaceful. ... | morning
    
      location | time    | assistants
      bottom   | morning | Ilk
    
      location | time    | other assistants
      bottom   | morning | Sat
      bottom   | morning | Kat
    

I think the initial step can be conversion to CSV, but I am thinking if there is any tool to design about how many tables we need for the data.

  1. Assisstants.csv - Name, Date, Location, Time
  2. Other.Assistants.csv - Name, Date, Location, Time
  3. General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in WPS Spreadsheet

enter image description here.doc file

Copy pasted data from tabular content:

Report, date: 11.11.2011 bottom: top: Assistants morning: Ilk Vir evening: Adr Ris night: Sai Pir Other assistants: morning: Sat, Kat Joh, Juh evening: Sam, Mar & Sel Kir, Kar night: Osk Sam General: morning: Mainly peaceful.

Loudy boys. Peaceful.

2 customers home. evening: Peaceful. Peaceful atmosphere. night:
One customer special help, but mainly peaceful. Extra care for one customer.

Exported as CSV

"Report, date:  11.11.2011              ",,
"bottom:            top:",,
Assistants,,
morning:,Ilk ,Vir 
evening:,Adr,Ris 
night:,Sai,Pir
Other assistants:,,
morning:,"Sat, Kat","Joh, Juh"
evening: ,"Sam, Mar & Sel","Kir, Kar "
night:,Osk,Sam
General:,,
morning:,Mainly peaceful. ,Peaceful.
,,
,Loudy boys. ,2 customers home. 
evening:,Peaceful. ,Peaceful atmosphere. 
,,
night:,,Extra care for one customer. 
,"One customer special help, but mainly peaceful. ",
,,

Expected output: tall tables of normalised data in many table(s)enter image description here

OS: Linux Debian Stretch 9.1 and others
Doc fileData: here Google Drive link which contains the Excel table in the original setting.odt file here

How to convert varying 3x3 Spreadsheet table into many Tall arrays by SED/AWK/Perl/Zsh/...?

I have horrible formatted data stored in Excel sheets of Doc document here, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. Steps

  1. Data in the Excel tables in .doc file: extraction of excel table from .doc file,

  2. Split of Excel table into 3 separate tables as .CSV files, and

  3. Application of your proposal:

    • Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

  • by location: bottom and top

  • by time: morning, evening and night

  • by workers: assistants and other assistants

  • by general description of events: General which then again grouped by time: morning, evening and night. Examples of data expanded into normalised tables

      location    | general | time 
      bottom      | Mainly peaceful. ... | morning
    
      location | time    | assistants
      bottom   | morning | Ilk
    
      location | time    | other assistants
      bottom   | morning | Sat
      bottom   | morning | Kat
    

I think the initial step can be conversion to CSV, but I am thinking if there is any tool to design about how many tables we need for the data.

Fig. 1 Table in WPS Spreadsheet

enter image description here

Copy pasted data from tabular content:

Report, date: 11.11.2011 bottom: top: Assistants morning: Ilk Vir evening: Adr Ris night: Sai Pir Other assistants: morning: Sat, Kat Joh, Juh evening: Sam, Mar & Sel Kir, Kar night: Osk Sam General: morning: Mainly peaceful.

Loudy boys. Peaceful.

2 customers home. evening: Peaceful. Peaceful atmosphere. night:
One customer special help, but mainly peaceful. Extra care for one customer.

Exported as CSV

"Report, date:  11.11.2011              ",,
"bottom:            top:",,
Assistants,,
morning:,Ilk ,Vir 
evening:,Adr,Ris 
night:,Sai,Pir
Other assistants:,,
morning:,"Sat, Kat","Joh, Juh"
evening: ,"Sam, Mar & Sel","Kir, Kar "
night:,Osk,Sam
General:,,
morning:,Mainly peaceful. ,Peaceful.
,,
,Loudy boys. ,2 customers home. 
evening:,Peaceful. ,Peaceful atmosphere. 
,,
night:,,Extra care for one customer. 
,"One customer special help, but mainly peaceful. ",
,,

Expected output: tall tables of normalised data in many table(s)

OS: Linux Debian Stretch 9.1
Doc file: here Google Drive link which contains the Excel table in the original setting

How to split Excel table into CSV files in .doc by Bold text?

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

  1. Extract Excel table from .doc file.

  2. Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

    new bolding indicates a new table i.e. a new CSV file.

  3. Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

  1. Assisstants.csv - Name, Date, Location, Time
  2. Other.Assistants.csv - Name, Date, Location, Time
  3. General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

enter image description here

OS: Linux Debian Stretch 9 and others
Data: .odt file here

added 181 characters in body
Source Link

I have horrible formatted data stored in Excel sheets of Doc document here, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. TheSteps

  1. Data in the Excel tables in .doc file: extraction of excel table from .doc file,

  2. Split of Excel table into 3 separate tables as .CSV files, and

  3. Application of your proposal:

    • Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

OS: Linux Debian Stretch 9.1
Doc file: here Google Drive link which contains the Excel table in the original setting

I have horrible formatted data stored in Excel sheets, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

OS: Linux Debian Stretch 9.1

I have horrible formatted data stored in Excel sheets of Doc document here, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. Steps

  1. Data in the Excel tables in .doc file: extraction of excel table from .doc file,

  2. Split of Excel table into 3 separate tables as .CSV files, and

  3. Application of your proposal:

    • Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

OS: Linux Debian Stretch 9.1
Doc file: here Google Drive link which contains the Excel table in the original setting

added 71 characters in body
Source Link
Loading
edited title
Link
Loading
Source Link
Loading