Return to Question

deleted 22 characters in body; edited tags; edited title

Source Link

edited Nov 29, 2017 at 19:55

Jeff Schaller ♦

68.8k
35
122
264

Parsing only lines that have 'N' specfic characters9 periods

I have 90 gig of data culled from 13.5 Terabytes.

I have tried sort -u | uniqsort -u | uniq on data that has been awk'd from the 13.5T of syslog data.

Some malformed data was apparent so I reran the parse with awk and 'seen' like so:

awk -F, '!seen[$1]++' inputfile > outputfile

 awk -F, '!seen[$1]++' inputfile > outputfile

This turned out to be the most time efficient means but also included some malformed data... maybe there are malformed log entries or in sorting uniq'ing and awk'ing some lines got munged. I do not care if there is a more/better way of parsing the original data, since I have a large enough sample size - meaning losing a little data out of 13.5T is OK.

There are 3 IP addresses per valid line.

Since there are 3 periods in an IP address, I need something that will parse out only lines that have 9 "."'s...

Help is appreciated =)

Parsing only lines that have 'N' specfic characters

I have 90 gig of data culled from 13.5 Terabytes.

I have tried sort -u | uniq on data that has been awk'd from the 13.5T of syslog data.

Some malformed data was apparent so I reran the parse with awk and 'seen' like so:

awk -F, '!seen[$1]++' inputfile > outputfile

There are 3 IP addresses per valid line.

Since there are 3 periods in an IP address, I need something that will parse out only lines that have 9 "."'s...

Help is appreciated =)

Parsing only lines that have 9 periods

I have 90 gig of data culled from 13.5 Terabytes.

I have tried sort -u | uniq on data that has been awk'd from the 13.5T of syslog data.

Some malformed data was apparent so I reran the parse with awk and 'seen' like so:

 awk -F, '!seen[$1]++' inputfile > outputfile

There are 3 IP addresses per valid line.

Since there are 3 periods in an IP address, I need something that will parse out only lines that have 9 "."'s.

Source Link

asked Nov 29, 2017 at 19:46

0xffffff

Parsing only lines that have 'N' specfic characters

I have 90 gig of data culled from 13.5 Terabytes.

I have tried sort -u | uniq on data that has been awk'd from the 13.5T of syslog data.

Some malformed data was apparent so I reran the parse with awk and 'seen' like so:

awk -F, '!seen[$1]++' inputfile > outputfile

There are 3 IP addresses per valid line.

Since there are 3 periods in an IP address, I need something that will parse out only lines that have 9 "."'s...

Help is appreciated =)

awk sed grep