2

I'd like to filter through a text file and only print the lines where each column is a valid floating point number. For example:

3 6 2 -4.2 21.2 
3 x 4.2 21.2 
3 2 2.2.2

Only the first line would pass as x, nor 2.2.2 are valid floats. I can write a python script that simply .splits() and runs a try/except block over each part, but this is slow for larger files. The input file has an unknown variable length number of columns and no scientific notation will be used. Is there an awk solution?

3 Answers 3

4
awk '
    # skip any obvious stuff
    /[^0-9. -]/ {next}
    {
        # test each field for a number
        for (i=1; i<=NF; i++) 
            if ($i + 0 != $i)
                next
        print
    }
'

This will break for valid numbers in scientific notation: 1.2e1 == 12

2
  • 1
    One can easily add e in the regular expression [^0-9. -e]. The test will then only fail when there are only e's in the line. Commented Jun 23, 2014 at 7:16
  • Be careful with bracket expressions: [0-9. -e] will match any character from space (ascii 32) to e (ascii 101). You want [^0-9. e-]: to match a literal hyphen, it needs to be either the first or the last character, otherwise it defines a range of chars. (gnu.org/software/gnulib/manual/html_node/…) Commented Jun 23, 2014 at 13:05
2

based on the conditions you state regex might be a possibility. I was able to get the following GNU awk script to work on RHEL.

 awk '{for (i=1; i<=NF; ++i) {if ($i !~ /^[-]?[[:digit:]]+(\.[[:digit:]]+)?$/) break;if (i == NF)print($0)}}' file.txt
2

Try something like this:

$ cat data.txt 
3 6 2 -4.2 21.2 
3 x 4.2 21.2 
3 2 2.2.2

$ awk '/^\s*(-?[0-9]+(\.[0-9]*)?\s+)+\s*$/ { print }' < data.txt 
3 6 2 -4.2 21.2 
1
  • PS: you asked for awk. Should be using grep instead... Commented Oct 22, 2012 at 20:40

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.