Revisions to Remove lines from tab-delimited file with missing values

added 52 characters in body

Source Link

edited Dec 27, 2015 at 10:02

252.3k
69
480
718

If your fields can never contain whitespace, an empty field means either a tab as a first character (^\t), a tab as the last character (\t$) or two consecutive tabs (\t\t). You could therefore filter out lines containing any of those:

grep -Ev $'^\t|\t\t|\t$' file

If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead (it considers a field with only spaces to be empty):

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

If your fields can never contain whitespace, an empty field means either a tab as a first character (^\t), a tab as the last character (\t$) or two consecutive tabs (\t\t). You could therefore filter out lines containing any of those:

grep -Ev $'^\t|\t\t|\t$' file

If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

If your fields can never contain whitespace, an empty field means either a tab as a first character (^\t), a tab as the last character (\t$) or two consecutive tabs (\t\t). You could therefore filter out lines containing any of those:

grep -Ev $'^\t|\t\t|\t$' file

If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead (it considers a field with only spaces to be empty):

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

deleted 387 characters in body

Source Link

edited Dec 23, 2015 at 10:09

terdon ♦

252.3k
69
480
718

AnIf your fields can never contain whitespace, an empty field means either a tab at the beginning of the lineas a first character (^\t), or a tab followed by whitespaceas the last character (\t\s\t$) or a tab at the end of the linetwo consecutive tabs (\t$\t\t). You could therefore filter out lines containing any of those:

grep -PvEv '\t\s|\t$|^\t'$'^\t|\t\t|\t$' file

The -P flag turns on Perl Compatible Regular Expressions (this is available on the GNU grep which is the default on Linux but is missing from other implementations) and the -v reverses the match, telling grep to print lines not matching the pattern. The | means ORIf you can have whitespace, letting us combine different patterns in onethings get more complex.

This will fail if If your fields can begin with spaces. If that's an issue, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

Personally, I would not use ay of these and would just go for Chaos's awk solution which is simple, robust and portable.

An empty field means either a tab at the beginning of the line (^\t), or a tab followed by whitespace (\t\s) or a tab at the end of the line (\t$). You could therefore filter out lines containing any of those:

grep -Pv '\t\s|\t$|^\t' file

The -P flag turns on Perl Compatible Regular Expressions (this is available on the GNU grep which is the default on Linux but is missing from other implementations) and the -v reverses the match, telling grep to print lines not matching the pattern. The | means OR, letting us combine different patterns in one.

This will fail if fields can begin with spaces. If that's an issue, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

Personally, I would not use ay of these and would just go for Chaos's awk solution which is simple, robust and portable.

If your fields can never contain whitespace, an empty field means either a tab as a first character (^\t), a tab as the last character (\t$) or two consecutive tabs (\t\t). You could therefore filter out lines containing any of those:

grep -Ev $'^\t|\t\t|\t$' file

If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

added 255 characters in body

Source Link

edited Dec 22, 2015 at 11:38

terdon ♦

252.3k
69
480
718

An empty field means either a tab at the beginning of the line (^\t), or a tab followed by whitespace (\t\s) or a tab at the end of the line (\t$). You could therefore filter out lines containing any of those:

grep -Pv '\t\s|\t$|^\t' file

The -P flag turns on Perl Compatible Regular Expressions (this is available on the GNU grep which is the default on Linux but is missing from other implementations) and the -v reverses the match, telling grep to print lines not matching the pattern. The | means OR, letting us combine different patterns in one.

This will fail if fields can begin with spaces. If that's an issue, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

You couldThat will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

Personally, I would not use ay of these and would just go for Chaos's awk solution which is simple, robust and portable.

An empty field means either a tab at the beginning of the line (^\t), or a tab followed by whitespace (\t\s) or a tab at the end of the line (\t$). You could therefore filter out lines containing any of those:

grep -Pv '\t\s|\t$|^\t' file

The -P flag turns on Perl Compatible Regular Expressions (this is available on the GNU grep which is the default on Linux but is missing from other implementations) and the -v reverses the match, telling grep to print lines not matching the pattern. The | means OR, letting us combine different patterns in one.

This will fail if fields can begin with spaces. If that's an issue, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

You could also use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

An empty field means either a tab at the beginning of the line (^\t), or a tab followed by whitespace (\t\s) or a tab at the end of the line (\t$). You could therefore filter out lines containing any of those:

grep -Pv '\t\s|\t$|^\t' file

The -P flag turns on Perl Compatible Regular Expressions (this is available on the GNU grep which is the default on Linux but is missing from other implementations) and the -v reverses the match, telling grep to print lines not matching the pattern. The | means OR, letting us combine different patterns in one.

This will fail if fields can begin with spaces. If that's an issue, use this instead:

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

Personally, I would not use ay of these and would just go for Chaos's awk solution which is simple, robust and portable.

added 255 characters in body

Source Link

edited Dec 22, 2015 at 11:31

terdon ♦

252.3k
69
480
718

Loading

added 255 characters in body

Source Link

edited Dec 22, 2015 at 11:21

terdon ♦

252.3k
69
480
718

Loading

Source Link

answered Dec 22, 2015 at 10:53

terdon ♦

252.3k
69
480
718

Loading

Stack Exchange Network

Return to Answer