awk select columns from input list

Question

I'd like an awk script to select the columns from a file based on a list of columns in another file. For example:

$cat cols
3 2 6 4

$cat text
a b c d e f g
h i j k l m n

$awk_script cols text
c b f d
j i m k

So the 3rd, 2nd, 6th and fourth columns have been selected in that order.

Thanks

hek2mgl · Accepted Answer · 2015-06-06 15:36:46Z

You can use this:

awk 'NR==FNR{n=split($0,c);next}{for(i=1;i<n;i++){printf "%s%s", $c[i], OFS};print ""}' cols text

We are passing two input files to awk, first the cols then the text. awk counts the number of input lines processed in the internal variable NR. FNR is the record number in the current file. When reading the first (and only) line of cols NR and FNR have a value of 1 meaning the following block gets executed.

{n=split($0,c);next} splits the whole line which is stored in $0 into the the array c using the global field delimiter and saves the number of columns to print in n. We will later use n in a for loop. next tells awk to stop processing the current line and read the next line of input.

The block {for(i=1;i<=n+1;i++){printf "%s",$c[i],OFS};print ""} gets executed on all other lines since it is not prefixed with a condition. The for loop iterates through cols and prints the corresponding columns delimited by the output file separator OFS. Finally we print a new line.

Output:

c b f d
j i m k

Never do a printf with input data in the format field (printf $c[i]), use the full printf synopsis instead, printf "%s", $c[i]. Imagine the difference if $c[i] contained a printf formatting string, e.g. %s. You're missing printing OFS between fields. The loop should end at <=n, not <n+1 for clarity and efficiency. Also, you should use print "" instead of printf "\n" as it's briefer and uses whatever value ORS is set to instead of hard-coding the same value.
@EdMorton Thanks for the advices. Much appreciated! I've edited it. The solution adds now and additonal OFS at the end of every line, but it should be good enough since OFS is a space.

Ed Morton · Accepted Answer · 2015-06-06 14:40:17Z

3

$ awk 'NR==FNR{n=split($0,f);next} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' cols text
c b f d
j i m k

answered Jun 6, 2015 at 14:40

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Collectives™ on Stack Overflow

awk select columns from input list

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related