Converting text data file to csv format via shell/bash

Question

I'm looking for a straightforward console solution to change a text file which looks like this:

...
Gender: M
Age: 46
History: 01305
Gender: F
Age: 46
History: 01306
Gender: M
Age: 19
History: 01307
Gender: M
Age: 19
History: 01308
....

To csv file like this one:

Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308

Any help appreciated

With following solutions I've received this output. Am I doing something wrong?

awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^   */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv

Gender,Age,History
M
,37
,00001
M
,37
,00001
M
,41
,00001

And what have you tried so far? Also you'll find a lot of duplicate questions with exact requirement :-) — sjsam
– sjsam, Commented Jan 31, 2018 at 8:14
This might help anyways awk 'BEGIN{printf "Gender,Age,History%s",ORS} {FS=":";count++} {sub(/^ */,"",$2);printf "%s%s",$2,(cou nt==3)?ORS:","}count==3{count=0}' file — sjsam
– sjsam, Commented Jan 31, 2018 at 8:18
Note an inadvertent space in count above. The space should be removed. It is even better to put the field separator in the BEGIN block as in awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' file — sjsam
– sjsam, Commented Jan 31, 2018 at 8:38
@sjsam can you take a look again? I tried your solution, it works, but there some issues. I'm using OSX shell, if it's necessary. Thanks a lot tho — user8086348
– user8086348, Commented Jan 31, 2018 at 10:15
I am puzzled with what you got, especially with the newlines before the commas. I am not sure it has got anything to do with you using the OSX..lemme have a research — sjsam
– sjsam, Commented Jan 31, 2018 at 12:17

Kent · Accepted Answer · 2018-01-31 08:23:46Z

1

This line should help:

awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}$0{print $2,$4,$6}' file

With your example as input, it gives:

Gender,Age,History
 M, 46, 01305
 F, 46, 01306
 M, 19, 01307
 M, 19, 01308

answered Jan 31, 2018 at 8:23

Kent

196k36 gold badges248 silver badges316 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tshiono · Accepted Answer · 2018-01-31 09:15:45Z

1

With bash builtin commands only, I would say:

#!/bin/bash

echo "Gender,Age,History"
while read line; do
    if [[ $line =~ ^Gender:\ *([^\ ]+) ]]; then
        r=${BASH_REMATCH[1]}
    elif [[ $line =~ ^Age:\ *([^\ ]+) ]]; then
        r+=,${BASH_REMATCH[1]}
    elif [[ $line =~ ^History:\ *([^\ ]+) ]]; then
        echo $r,${BASH_REMATCH[1]}
    fi
done < data.text

answered Jan 31, 2018 at 9:15

tshiono

22.3k2 gold badges18 silver badges26 bronze badges

1 Comment

sjsam Over a year ago

read without -r will mangle backslashes. Also double quote $r & ${BASH_REMATCH[1]} to prevent globbing and word splitting.

stalet · Accepted Answer · 2018-01-31 10:13:10Z

1

Here is a way to do it in bash. Assuming your datafile is called data.txt

#!/bin/bash

echo "Gender,Age,History"
while read -r line; do
  printf '%s' "$(cut -d ' ' -f2 <<< $line )"
  if [[ "$line" =~ ^History.* ]]; then
    printf "\n"
  else
    printf ","
  fi
done < data.txt

Outputs:

Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308

edited Jan 31, 2018 at 10:13

answered Jan 31, 2018 at 8:50

stalet

1,38516 silver badges25 bronze badges

2 Comments

sjsam Over a year ago

You should double quote $(cut -d ' ' -f2 <<< $line ). Also, read without -r will mangle backslashes.Here strings: <<< WORD appeared in bash 2.05b-alpha1. Though we can be pretty sure the op will have a later version of bash.

stalet Over a year ago

Updated with @sjsam comments. Thanks :)

user8086348 · Accepted Answer · 2018-02-01 15:04:27Z

I still don't know where exactly was the problem So I decided to cleanup the data from all of characters except ones, which are supposed to be there (most probably unusual end of the line symbol)

sed -e 's/[^a-zA-Z*0-9:]/ /g;s/  */ /g' history.txt > output.txt

And after that succesfully used the solution from @sjsam

awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^   */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv

Thanks everyone!

Collectives™ on Stack Overflow

Converting text data file to csv format via shell/bash

4 Answers 4

Comments

1 Comment

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

2 Comments

Comments

Related