added 113 characters in body

Source Link

edited Oct 27, 2017 at 17:41

7.1k
30
103
201

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

Extract Excel table from .doc file., which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?
Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

new bolding indicates a new table i.e. a new CSV file.
Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

Assisstants.csv - Name, Date, Location, Time
Other.Assistants.csv - Name, Date, Location, Time
General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

OS: Linux Debian Stretch 9 and others
Data: .odt file here

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

Extract Excel table from .doc file.
Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

new bolding indicates a new table i.e. a new CSV file.
Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

Assisstants.csv - Name, Date, Location, Time
Other.Assistants.csv - Name, Date, Location, Time
General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

OS: Linux Debian Stretch 9 and others
Data: .odt file here

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

Extract Excel table from .doc file, which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?
Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

new bolding indicates a new table i.e. a new CSV file.
Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

Assisstants.csv - Name, Date, Location, Time
Other.Assistants.csv - Name, Date, Location, Time
General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

OS: Linux Debian Stretch 9 and others
Data: .odt file here

clearer

Source Link

edited Oct 26, 2017 at 21:19

Léo Léopold Hertz 준영

7.1k
30
103
201

How to convert varying 3x3 Spreadsheetsplit Excel table into many Tall arrays by SED/AWK/Perl/Zsh/..CSV files in .doc by Bold text?

IYou have horrible formatted data stored in777 .doc files where each .doc file contains a big Excel sheets of Doc documenttable, like one here and in Fig. 1. Here, which I am thinkingonly consider one .doc file. I want to parsedivide the Excel table of .doc file into many tall arrays such I can do data analysis on some cells systematicallyCSV files by any Unix programming language and/or scripting. StepsI cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

Data in theExtract Excel tables in .doc file: extraction of excel table from .doc file,.
Split of Excel table (maybe convert here already to CSV) into 3 separate tables as .CSV files, and by Rule:

new bolding indicates a new table i.e. a new CSV file.
ApplicationApply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of your proposal:the .doc file on the each separate CSV file. Use Time column (morning/evening/night).
- Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:Target files with their columns by Rule

by location: bottom and top

by time: morning, evening and night

by workers: assistants and other assistants

by general description of events: General which then again grouped by time: morning, evening and night. Examples of data expanded into normalised tables

  location    | general | time 
  bottom      | Mainly peaceful. ... | morning

  location | time    | assistants
  bottom   | morning | Ilk

  location | time    | other assistants
  bottom   | morning | Sat
  bottom   | morning | Kat

I think the initial step can be conversion to CSV, but I am thinking if there is any tool to design about how many tables we need for the data.

Assisstants.csv - Name, Date, Location, Time

Other.Assistants.csv - Name, Date, Location, Time

General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in WPS Spreadsheet

.doc file

Copy pasted data from tabular content:

Report, date: 11.11.2011 bottom: top: Assistants morning: Ilk Vir evening: Adr Ris night: Sai Pir Other assistants: morning: Sat, Kat Joh, Juh evening: Sam, Mar & Sel Kir, Kar night: Osk Sam General: morning: Mainly peaceful.

Loudy boys. Peaceful.

2 customers home. evening: Peaceful. Peaceful atmosphere. night:
One customer special help, but mainly peaceful. Extra care for one customer.

Exported as CSV

"Report, date:  11.11.2011              ",,
"bottom:            top:",,
Assistants,,
morning:,Ilk ,Vir 
evening:,Adr,Ris 
night:,Sai,Pir
Other assistants:,,
morning:,"Sat, Kat","Joh, Juh"
evening: ,"Sam, Mar & Sel","Kir, Kar "
night:,Osk,Sam
General:,,
morning:,Mainly peaceful. ,Peaceful.
,,
,Loudy boys. ,2 customers home. 
evening:,Peaceful. ,Peaceful atmosphere. 
,,
night:,,Extra care for one customer. 
,"One customer special help, but mainly peaceful. ",
,,

Expected output: tall tables of normalised data in many table(s)

OS: Linux Debian Stretch 9.1 and others
Doc fileData: here Google Drive link which contains the Excel table in the original setting.odt file here

How to convert varying 3x3 Spreadsheet table into many Tall arrays by SED/AWK/Perl/Zsh/...?

I have horrible formatted data stored in Excel sheets of Doc document here, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. Steps

Data in the Excel tables in .doc file: extraction of excel table from .doc file,
Split of Excel table into 3 separate tables as .CSV files, and
Application of your proposal:
- Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

by location: bottom and top

by time: morning, evening and night

by workers: assistants and other assistants

by general description of events: General which then again grouped by time: morning, evening and night. Examples of data expanded into normalised tables

  location    | general | time 
  bottom      | Mainly peaceful. ... | morning

  location | time    | assistants
  bottom   | morning | Ilk

  location | time    | other assistants
  bottom   | morning | Sat
  bottom   | morning | Kat

I think the initial step can be conversion to CSV, but I am thinking if there is any tool to design about how many tables we need for the data.

Fig. 1 Table in WPS Spreadsheet

Copy pasted data from tabular content:

Report, date: 11.11.2011 bottom: top: Assistants morning: Ilk Vir evening: Adr Ris night: Sai Pir Other assistants: morning: Sat, Kat Joh, Juh evening: Sam, Mar & Sel Kir, Kar night: Osk Sam General: morning: Mainly peaceful.

Loudy boys. Peaceful.

2 customers home. evening: Peaceful. Peaceful atmosphere. night:
One customer special help, but mainly peaceful. Extra care for one customer.

Exported as CSV

"Report, date:  11.11.2011              ",,
"bottom:            top:",,
Assistants,,
morning:,Ilk ,Vir 
evening:,Adr,Ris 
night:,Sai,Pir
Other assistants:,,
morning:,"Sat, Kat","Joh, Juh"
evening: ,"Sam, Mar & Sel","Kir, Kar "
night:,Osk,Sam
General:,,
morning:,Mainly peaceful. ,Peaceful.
,,
,Loudy boys. ,2 customers home. 
evening:,Peaceful. ,Peaceful atmosphere. 
,,
night:,,Extra care for one customer. 
,"One customer special help, but mainly peaceful. ",
,,

Expected output: tall tables of normalised data in many table(s)

OS: Linux Debian Stretch 9.1
Doc file: here Google Drive link which contains the Excel table in the original setting

How to split Excel table into CSV files in .doc by Bold text?

You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1. Here, only consider one .doc file. I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting. I cannot find a way to handle Microsoft fileformats into CSV files. Pseudocode:

Extract Excel table from .doc file.
Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:

new bolding indicates a new table i.e. a new CSV file.
Apply implicit columns Location (bottom/top) and Date (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use Time column (morning/evening/night).

Target files with their columns by Rule

Assisstants.csv - Name, Date, Location, Time

Other.Assistants.csv - Name, Date, Location, Time

General.csv - Event, Date, Location, Time

Fig. 1 Example of Excel Table in .doc file

OS: Linux Debian Stretch 9 and others
Data: .odt file here

added 181 characters in body

Source Link

edited Oct 24, 2017 at 13:32

Léo Léopold Hertz 준영

7.1k
30
103
201

I have horrible formatted data stored in Excel sheets of Doc document here, which I am thinking to parse into many tall arrays such I can do data analysis on some cells systematically. TheSteps

Data in the Excel tables in .doc file: extraction of excel table from .doc file,

Split of Excel table into 3 separate tables as .CSV files, and

Application of your proposal:
- Testing soon RubberStamp's proposal by PostgreSQL

The problem is that I get many of such Excel tables every day. I am thinking if I can do any analysis on them without heavy manual workload. There are many parameters which I would store at least in 3 tall table:

OS: Linux Debian Stretch 9.1
Doc file: here Google Drive link which contains the Excel table in the original setting

added 71 characters in body

Source Link

edited Oct 23, 2017 at 14:24

Léo Léopold Hertz 준영

7.1k
30
103
201

Loading

edited title

Link

edited Oct 23, 2017 at 12:13

Léo Léopold Hertz 준영

7.1k
30
103
201

Loading

Source Link

asked Oct 23, 2017 at 11:56

Léo Léopold Hertz 준영

7.1k
30
103
201

Loading

Stack Exchange Network

Return to Question

How to convert varying 3x3 Spreadsheetsplit Excel table into many Tall arrays by SED/AWK/Perl/Zsh/..CSV files in .doc by Bold text?

How to convert varying 3x3 Spreadsheet table into many Tall arrays by SED/AWK/Perl/Zsh/...?

How to split Excel table into CSV files in .doc by Bold text?