4

I have generated a bunch of files which all contain just one number. I then have some information about these files in each filename. What I want to do, is to collect all file contents as a column in a new file, and then get some part of each filename as separate columns in this new file.

The filenames look like this: traj-num1-iter-num2-states-num3.gradient, where num1, num2, and num3 are just different numbers. An example of what I want:

$ cat traj-10-iter-220-states-01.gradient
-0.0014868599999999788

$ cat newfile
traj    iter     states    gradient
10      220      01        -0.0014868599999999788

I suspect this can be achieved, but I don't know how.

1 Answer 1

9

Using AWK’s FILENAME variable:

awk 'BEGIN { OFS = "\t"; print "traj", "iter", "states", "gradient"; FS="-|\\." } { gradient=$0; $0=FILENAME; print $2, $4, $6, gradient }' traj-*-iter-*-states-*.gradient

will output the requested header line, then process each traj-*-iter-*-states-*.gradient file, outputting the values extracted from its filename, and its contents.

The following variant, based on a suggestion by Olivier Dulac, extracts the header line from the filename and uses a simpler version of FS:

awk 'BEGIN { OFS = "\t"; FS="[-.]" } { contents=$0; $0=FILENAME; if (!header) { print $1, $3, $5, $7; header=1 }; print $2, $4, $6, contents }' traj-*-iter-*-states-*.gradient

You can change the glob at the end to match whichever files you’re interested in, and the header will adapt (to the first file that’s processed).

8
  • Did not know about the FILENAME variable, that can come in handy thank you ! Commented Apr 24, 2017 at 12:12
  • An answer taking care of this just based on FILENAME and its content (ie, without pre-assuming than FILENAME contains "iter" or "states" or "traj", but finding those out from FILENAME itself) : awk 'BEGIN { OFS = "\t"; FS="[.-]" } { lastvalue=$0 ; $0=FILENAME; print $1, $3, $5, $7 ; print $2, $4, $6, lastvalue }' traj*.gradient (note: [.] matches only a litteral . and is more readable than \\. imo Commented Apr 24, 2017 at 12:41
  • I don't fully understand the syntax of "FS="-|\\."". You set the field separator to be what, exactly? Both a "dash" and a "dot"? What does the "\\" mean? @OlivierDulac does the order of "[.-]" matter? Would "[-.]" be the same? Commented Apr 24, 2017 at 13:02
  • 1
    Yes, FS is set to match either a - or an actual .; -|\\. is one way of writing that as a regex for AWK (the . needs to be escaped, because . in a regex means “any character”). @Olivier’s [.-] form is a more readable variant. Commented Apr 24, 2017 at 13:10
  • 1
    @Olivier thanks for the suggestion, I’ve adapted it and added it to my answer (with a filter to avoid printing the header for every file processed). Commented Apr 24, 2017 at 13:15

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.