I have a file that is a mapping between filename and the corresponding transcripts. The file name and transcripts is separated by a TAB character.
The transcription of the file may contain one or more words that are separated by a single blank space. Following is a layout of the file,
[filename] [tab space] [trancription]
In some lines, the transcript column is empty. Such lines will be of the form
[filename]
i.e. there is no transcript available for that filename.
Now, my job is to make sure that I select only those lines that have a filename and transcription (that is the files whose transcript column is not empty).
I tried the following commands
(1) awk 'NF>2' filename
(2) awk 'NF==2' filename
(3) awk 'NF>1' filename
but did not get the results
In addition, when I used the command
(4) awk ' NF==2 {print $0} ' myfile > newfile
I was also getting those lines with had only one column namely the "filename" field.
When I write NF<1 there is no output (as expected)
When I write NF<2 again there is no output (strange, it should have displayed lines with only one column)
When I write NF ==3 I am getting those lines which have exactly two columns (again confusing)
What's the catch? Its really confusing.
Now I am sending you the input sample
M07UP36A0821I40.wav
M07UP36A0821I41.wav
M07UP36A0821I410.wav gaajara <bn>
M07UP36A0821I411.wav tiina sau <pau> taintaaliisa
M07UP36A0821I412.wav geehuun anya <bn>
M07UP36A0821I413.wav geehuun daraa <babble>
Now I use the command
grep '^[^[:blank:]]\+[[:blank:]]\+[^[:blank:]]\+$' cll
This command is giving no output (neither on the terminal nor in the redirected file).
Now there is an INTERESTING thing to note:
When the input file contains
M07UP36A0822I413.wav <bn> geehuun daraa <horn> <babble>
M07UP36A0822I414.wav
M07UP36A0822I415.wav gudxqa piilaa <horn> <babble>
M07UP36A0822I416.wav <vn> gudxqa
M07UP36A0822I417.wav gudxqa
M07UP36A0822I418.wav gudxqa anya <babble>
M07UP36A0822I419.wav harii matxara <bn> <babble>
Again, on using the same command
grep '^[^[:blank:]]\+[[:blank:]]\+[^[:blank:]]\+$' foo
terminal has STARTED showing the output. The output in this was obtained as
M07UP36A0822I417.wav gudxqa
My desired output for the file foo would be those lines that are complete (both first column and second column must be there) Here is the required output
M07UP36A0822I413.wav <bn> geehuun daraa <horn> <babble>
M07UP36A0822I415.wav gudxqa piilaa <horn> <babble>
M07UP36A0822I416.wav <vn> gudxqa
M07UP36A0822I417.wav gudxqa
M07UP36A0822I418.wav gudxqa anya <babble>
M07UP36A0822I419.wav harii matxara <bn> <babble>
I used the following command on the file cll (the first sample in my question)
awk -F'\t' '(NF !=2) { print "line: " NR " does not have 2 columns: " $0 ;}' cll
The result was displayed on the terminal. The result is
line: 1 does not have 2 columns: M07UP36A0821I40.wav
line: 2 does not have 2 columns: M07UP36A0821I41.wav
line: 3 does not have 2 columns: M07UP36A0821I410.wav gaajara <bn>
line: 4 does not have 2 columns: M07UP36A0821I411.wav tiina sau <pau> taintaaliisa
line: 5 does not have 2 columns: M07UP36A0821I412.wav geehuun anya <bn>
line: 6 does not have 2 columns: M07UP36A0821I413.wav geehuun daraa <babble>
awk 'NF==2'ought to be a correct solution to your problem as described, so something else is going on here.