Reshaping tab separated file

Question

I have a file that looks like this:

 Time   Flag
 0.65   5.885581e-01
 0.56   5.847484e-01
 0.58   5.278409e-01
 0.57   1.140746e+00
 1.00   0
 0.00   0
 1.00   1
 0.00   0

But this is because I forgot to transpose something when it was written. So half of the timings ended up in the flag column and half of the flags ended up in the Time column. It should have been

So in this small example, the first four rows should become the first column. The last four rows should become the second column. Notice that half of the elements in each column are already correctly formatted (in alternating fashion) in the original file. What would be the easiest way to reshape and reformat this?

you mean to keep only the last character/digit from the second column? can you edit your question and say what is your first file and your final expected output instead? — αғsнιη
– αғsнιη, Commented May 19, 2021 at 9:46
reading your last paragraph of your question, your given output is not what you described as. based on your saying output should have 4lines with the same data not sure how you converted them. please edit as I said above again — αғsнιη
– αғsнιη, Commented May 19, 2021 at 10:30

Kusalananda · Accepted Answer · 2021-05-19 14:56:10Z

$ awk 'NR > 1 { d[++n] = $1; d[++n] = $2 } END { print "Time", "Flag"; for (i = 1; 2*i <= n; ++i) printf "%.2f%s%d%s", d[i], OFS, d[n/2 + i], ORS }' file
Time Flag
0.65 1
0.59 0
0.56 0
0.58 0
0.58 1
0.53 1
0.57 0
1.14 0

The awk code reads all the data, row-wise, into consecutive elements in the d array (skipping the header).

In the END block, the header is printed, then we iterate with i from 1 to half the length of our d array, printing the value at index i (the time value) and at n/2 + i (the flag value). The formatting of the output is done so that the time value is printed as a floating point value with two decimals, and the flag value is printed as an integer.

The OFS and ORS variables hold the default output field and record separators (a space and a newline by default). For tab-delimited output, use awk -v OFS='\t' '{ ... }' file.

For reference, the awk code as a free-standing script:

#!/usr/bin/awk -f

NR > 1 {
    d[++n] = $1
    d[++n] = $2
}

END {
    print "Time", "Flag"

    for (i = 1; 2*i <= n; ++i)
        printf "%.2f%s%d%s", d[i], OFS, d[n/2 + i], ORS
}

Zach Young · Accepted Answer · 2021-05-25 08:30:02Z

OP didn't state that the problem required anything other than dividing the file in half, fixing up each half, and joining back togther, but I was wondering how it might be done without that knowledge.

And I'm really liking csvkit and especially GoCSV for declarative pipeline processing.

Convert/set-up

The first step is to convert the fixed-width example data to CSV. csvkit lets me specify a column-width schema...

schema.csv

column,start,length
Time,0,4
Flag,7,13

and then convert to CSV:

in2csv -s schema.csv input.txt > input.csv

Separate and transpose

I want to split the input by category: flags and times. And then for each category make a set of one-line CSVs (for transposing in the next step):

gocsv filter -c 2 --regex "^[01]$" input.csv | 
gocsv split --max-rows 1 --filename-base flags

and for the complementary timings, use the same pattern with --exclude:

gocsv filter -c 2 --regex "^[01]$" --exclude input.csv | 
gocsv split --max-rows 1 --filename-base times

This produces a set of files like:

ls flags-*.csv                                     
flags-1.csv  flags-2.csv  flags-3.csv  flags-4.csv

and each looks something like:

flags-1.csv

Time,Flag
1.00,0

or:

times-4.csv

Time,Flag
0.57,1.140746e+00

The Time,Flag header isn't really important, but it's important to see that it's there and will affect the structure after the transpose.

For each of those files, iterate: transposing, selecting the second column (because the bogus header becomes the first column), and cap-ping with the appropriate header:

for FILE in flags-*; do 
  gocsv transpose $FILE | 
  gocsv select -c 2 | 
  gocsv cap --names Flag \
  > trans_$FILE; 
done

for FILE in times-*; do
  gocsv transpose $FILE |
  gocsv select -c 2 |
  gocsv cap --names Time \
  > trans_$FILE;
done

Stack into columns

Stack the flags and times into their own columns:

gocsv stack trans_flags*  > col_flag.csv

gocsv stack trans_times*  > col_time.csv

And then zip together (and I was just going to use csvlook for its prettier Markdown table, but it also normalized the numbers!):

gocsv zip col_time.csv col_flag.csv > output.csv
csvlook output.csv

Time	Flag
0.650…	1
0.589…	0
0.560…	0
0.585…	0
0.580…	1
0.528…	1
0.570…	0
1.141…	0

which is cute.

But here's the final step using GoCSV's built-in (and new-to-me) SPRIG templates and the round function to clean up floats, integers, and scientific notation. I add new columns based on the originals, then select only the new/formatted columns:

cat output.csv |
gocsv add -n Time -t "{{round .Time 2}}" |
gocsv add -n Flag -t "{{round .Flag 0}}" |
gocsv select -c 3- |
gocsv tsv

and we get:

Time    Flag
0.65    1
0.59    0
0.56    0
0.58    0
0.58    1
0.53    1
0.57    0
1.14    0

Stack Exchange Network

Reshaping tab separated file

2 Answers 2

Convert/set-up

Separate and transpose

Stack into columns

You must log in to answer this question.

Hot Network Questions

Reshaping tab separated file

2 Answers 2

Convert/set-up

Separate and transpose

Stack into columns

You must log in to answer this question.

Related

Hot Network Questions