Finding a missing sequential number in a data file

Question

How can I search a large file with 'sequential numbers' (that reset after 16) in a particular column to find a missing line?

I have a data file:

col1 col2 col3 col4 col5 1
col1 col2 col3 col4 col5 2
.
.
.
col1 col2 col3 col4 col5 15
col1 col2 col3 col4 col5 16
col1 col2 col3 col4 col5+1 1

where there last column counts from 1 to 16 then resets back to one. At this point 1 is added to column 5.

A clean output would just iterate up until the end of the file. How can I find missing data, e.g.

col1 col2 col3 col4 col5 1
col1 col2 col3 col4 col5 3

where a row has been skipped/lost as can be seen from the last column that has skipped the value of 2?

I'd like the line number/location of the line before or after the missing data as the desired output.

This answer on Stack Overflow gave me the idea to use awk. So what I've come up with is:

awk '$6!=p+1{print NR}{p=$6}'

To try and print the current line number when column 6 of the current line is not equal to column 6 of the last line +1. This fails due to the looping nature of getting to 16 and going back to 1.

rudimeier · Accepted Answer · 2016-10-25 16:50:52Z

6

$ cat -n file
 1  col1 col2 col3 col4 col5 14
 2  col1 col2 col3 col4 col5 15
 3  col1 col2 col3 col4 col5 16
 4  col1 col2 col3 col4 col5 1
 5  col1 col2 col3 col4 col5 2
 6  col1 col2 col3 col4 col5 15
 7  col1 col2 col3 col4 col5 16
 8  col1 col2 col3 col4 col5 4
 9  col1 col2 col3 col4 col5 5

$ awk '{if (p % 16 + 1 != $6) printf("line %d is bad: %s\n", NR, $0); p=$6}' file
line 1 is bad: col1 col2 col3 col4 col5 14
line 6 is bad: col1 col2 col3 col4 col5 15
line 8 is bad: col1 col2 col3 col4 col5 4

To understand the value of the modulo operator "%" (division remainder), you may play around with this awk snippet:

$ yes | head -n 40 | awk '{x=NR-1; print x, "->", x % 16}'
0 -> 0
1 -> 1
2 -> 2
[...]
14 -> 14
15 -> 15
16 -> 0
17 -> 1
18 -> 2
[...]

edited Oct 25, 2016 at 16:50

answered Oct 25, 2016 at 15:32

rudimeier

10.8k2 gold badges35 silver badges46 bronze badges

That works like a charm. How does it work? (Specifically the p % 16 bit)

Christopher
– Christopher

2016-10-25 15:34:52 +00:00
Commented Oct 25, 2016 at 15:34
regarding "%" google "modulo operator", it's not only awk specific: en.wikipedia.org/wiki/Modulo_operation

rudimeier
– rudimeier

2016-10-25 15:38:02 +00:00
Commented Oct 25, 2016 at 15:38
@Christopher If you worry about % use ++p != $6 ... p=$6<16?$6:0}'

Costas
– Costas

2016-10-25 16:02:48 +00:00
Commented Oct 25, 2016 at 16:02
@Costas This could be wrong if you have numbers >=17. It would say 1 after 17 is correct.

rudimeier
– rudimeier

2016-10-25 16:13:20 +00:00
Commented Oct 25, 2016 at 16:13
Firstly 17 after any number (even after 16) is incorrect. And up to you which number after 17 is correct? 2?

Costas
– Costas

2016-10-25 16:24:54 +00:00
Commented Oct 25, 2016 at 16:24

| Show 1 more comment

Stack Exchange Network

Finding a missing sequential number in a data file

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Finding a missing sequential number in a data file

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions