I have several files (*data.txt) and I am trying to split each of them into multiple files based on the content of column 1. I have managed to split them but I do not know how to name the output files with both $filename and column 1($1) using print. At the moment print in the following command gives me $1 ".txt", so for example: ENSG00000108094.txt, ENSG00000115232.txt instead of file1_ENSG00000108094.txt, file1_ENSG00000115232.txt which is not suitable as I need to have separate outputs for each input file.
Here it is my command and I am not sure where I should use "$b" to get the expected outcome.
for filename in *_data.txt
do
b=${filename%%_data.txt}
cat $filename | awk 'NR==1 {header = $0; next}!header_printed[$1]++ {print header > $1".txt"}{print > $1".txt"}'
done
Thanks.
bfrom the internalFILENAMEvariable. For only "several" files you could likely omit the loop and pass all the*_data.txtto a single invocation of awk and useFNR==1in place ofNR==1to set the header and new filename - or in GNU awk, use the specialBEGINFILErule.