I want to create and organize data in a file from a number of files by selecting parts of the columns of the given files. I have more than 10 file to copy the second, third and forth columns of each file and pasting them into a single file.
2 Answers
This can also be done quite easily with awk.
$ awk '{print $2,$3,$4}' *.txt > collapsed_output.txt
Example
Here's some sample data.
$ seq 20 | paste - - - - - > sample.txt
Here's what the lines look like:
$ head sample.txt
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Now let's make 10 copies:
$ seq 10 | xargs -I{} cp sample.txt sample{}.txt
We now have the following files:
$ tree
.
|-- sample10.txt
|-- sample1.txt
|-- sample2.txt
|-- sample3.txt
|-- sample4.txt
|-- sample5.txt
|-- sample6.txt
|-- sample7.txt
|-- sample8.txt
|-- sample9.txt
`-- sample.txt
Now if we run our awk command:
$ awk '{print $2, $3, $4}' sample{1..10}.txt | column -t
2 3 4
7 8 9
12 13 14
17 18 19
2 3 4
7 8 9
12 13 14
17 18 19
2 3 4
7 8 9
12 13 14
17 18 19
...
Here I'm showing you the output for the first 3 files (sample01.txt ... sample03.txt). Also I'm formatting the output with the column -t command, but this is only for display purposes to make the output easier to see here on U&L.
Additional formatting could just as easily been done within the awk command but that seemed to be beyond the scope of the question.
Have a look at the command line utility named cut. It can extract columns if they are separated by a unique delimiter. To recombine the parts you can use paste.
If you have, for example a typical comma-separated format
$ cat debts.csv
Name,Age,Debt
Alice,20,1337
Bob,30,42
$ cat pets.csv
Name,Pet
Alice,Dog
Bob,Cat
you could extract names and debts with
$ cut -d, -f1,3 debts.csv
Name,Debt
Alice,1337
Bob,42
and combine debts with pets using
$ cut -d, -f2 pets.csv | paste -d, debts.csv -
Name,Age,Debt,Pet
Alice,20,1337,Dog
Bob,30,42,Cat
- With
cutandpaste,-ddetermines the delimiter for the fields, -fselects the columns to extract forcutand-directs to use the standard input (i.e. in the latterpastecase, from the pipe) instead of a file.