Parsing only lines that have 9 periods

Question

I have 90 gig of data culled from 13.5 Terabytes.

I have tried sort -u | uniq on data that has been awk'd from the 13.5T of syslog data.

Some malformed data was apparent so I reran the parse with awk and 'seen' like so:

 awk -F, '!seen[$1]++' inputfile > outputfile

This turned out to be the most time efficient means but also included some malformed data... maybe there are malformed log entries or in sorting uniq'ing and awk'ing some lines got munged. I do not care if there is a more/better way of parsing the original data, since I have a large enough sample size - meaning losing a little data out of 13.5T is OK.

There are 3 IP addresses per valid line.

Since there are 3 periods in an IP address, I need something that will parse out only lines that have 9 "."'s.

Do you care how valid the IP's appear? In other words, are you OK selecting a line like: 20.20.20.20 foo.bar.baz 300.300.300.300 ? — Jeff Schaller
– Jeff Schaller ♦, Commented Nov 29, 2017 at 20:06

John1024 · Accepted Answer · 2017-11-29 20:16:45Z

Let's take this as a test file:

$ cat testfile
1.2.3.4 5.6.7.8 9.10.11.12  Keep
1.2.3.4 5.6.7.8 9.10.11     Bad: Missing 1
1.2.3.4 5.6.7.8 9.10.11.12. Bad: Extra period

Using grep

To select lines with exactly nine periods:

$ grep -E '^([^.]*\.){9}[^.]*$' testfile
1.2.3.4 5.6.7.8 9.10.11.12  Keep

[^.]*\. matches any number of non-period characters followed by a ([^.]*\.){9} matches exactly nine sequences of zero or more non-period characters followed by a period. The ^ at the beginning requires that the regex match starting at the beginning of the line. The [^.]*$ means that, between the end of the nine sequences and the end of the line, only non-period characters are allowed.

Using sed

$ sed -En '/^([^.]*\.){9}[^.]*$/p' testfile
1.2.3.4 5.6.7.8 9.10.11.12  Keep

The -n option tells sed not to print unless we explicitly ask it to. The p following the regex explicitly asks sed to print those lines which match the regex.

Using awk

$ awk '/^([^.]*\.){9}[^.]*$/' testfile
1.2.3.4 5.6.7.8 9.10.11.12  Keep

Or, using awk's ability to define a character to separate fields (hat tip: Jeff Schaller):

$ awk -F. 'NF==10' testfile
1.2.3.4 5.6.7.8 9.10.11.12  Keep

@JeffSchaller Yes, excellent. (I had too look up in the docs to verify that, when FS is not treated as a regex when it is a single character.) — John1024
– John1024, Commented Nov 29, 2017 at 20:06

Stack Exchange Network

Parsing only lines that have 9 periods

1 Answer 1

Using grep

Using sed

Using awk

You must log in to answer this question.

Hot Network Questions

Parsing only lines that have 9 periods

1 Answer 1

Using grep

Using sed

Using awk

You must log in to answer this question.

Related

Hot Network Questions