0

I'm looking for a straightforward console solution to change a text file which looks like this:

...
Gender: M
Age: 46
History: 01305
Gender: F
Age: 46
History: 01306
Gender: M
Age: 19
History: 01307
Gender: M
Age: 19
History: 01308
....

To csv file like this one:

Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308

Any help appreciated


With following solutions I've received this output. Am I doing something wrong?

awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^   */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv

Gender,Age,History
M
,37
,00001
M
,37
,00001
M
,41
,00001
6
  • 2
    And what have you tried so far? Also you'll find a lot of duplicate questions with exact requirement :-) Commented Jan 31, 2018 at 8:14
  • 1
    This might help anyways awk 'BEGIN{printf "Gender,Age,History%s",ORS} {FS=":";count++} {sub(/^ */,"",$2);printf "%s%s",$2,(cou nt==3)?ORS:","}count==3{count=0}' file Commented Jan 31, 2018 at 8:18
  • Note an inadvertent space in count above. The space should be removed. It is even better to put the field separator in the BEGIN block as in awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' file Commented Jan 31, 2018 at 8:38
  • @sjsam can you take a look again? I tried your solution, it works, but there some issues. I'm using OSX shell, if it's necessary. Thanks a lot tho Commented Jan 31, 2018 at 10:15
  • I am puzzled with what you got, especially with the newlines before the commas. I am not sure it has got anything to do with you using the OSX..lemme have a research Commented Jan 31, 2018 at 12:17

4 Answers 4

1

This line should help:

awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}$0{print $2,$4,$6}' file

With your example as input, it gives:

Gender,Age,History
 M, 46, 01305
 F, 46, 01306
 M, 19, 01307
 M, 19, 01308
Sign up to request clarification or add additional context in comments.

Comments

1

With bash builtin commands only, I would say:

#!/bin/bash

echo "Gender,Age,History"
while read line; do
    if [[ $line =~ ^Gender:\ *([^\ ]+) ]]; then
        r=${BASH_REMATCH[1]}
    elif [[ $line =~ ^Age:\ *([^\ ]+) ]]; then
        r+=,${BASH_REMATCH[1]}
    elif [[ $line =~ ^History:\ *([^\ ]+) ]]; then
        echo $r,${BASH_REMATCH[1]}
    fi
done < data.text

1 Comment

read without -r will mangle backslashes. Also double quote $r & ${BASH_REMATCH[1]} to prevent globbing and word splitting.
1

Here is a way to do it in bash. Assuming your datafile is called data.txt

#!/bin/bash

echo "Gender,Age,History"
while read -r line; do
  printf '%s' "$(cut -d ' ' -f2 <<< $line )"
  if [[ "$line" =~ ^History.* ]]; then
    printf "\n"
  else
    printf ","
  fi
done < data.txt

Outputs:

Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308

2 Comments

You should double quote $(cut -d ' ' -f2 <<< $line ). Also, read without -r will mangle backslashes.Here strings: <<< WORD appeared in bash 2.05b-alpha1. Though we can be pretty sure the op will have a later version of bash.
Updated with @sjsam comments. Thanks :)
0

I still don't know where exactly was the problem So I decided to cleanup the data from all of characters except ones, which are supposed to be there (most probably unusual end of the line symbol)

sed -e 's/[^a-zA-Z*0-9:]/ /g;s/  */ /g' history.txt > output.txt

And after that succesfully used the solution from @sjsam

awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^   */,"",$2);printf "%s%s",$2,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv

Thanks everyone!

Comments