1

How can I search a large file with 'sequential numbers' (that reset after 16) in a particular column to find a missing line?

I have a data file:

col1 col2 col3 col4 col5 1
col1 col2 col3 col4 col5 2
.
.
.
col1 col2 col3 col4 col5 15
col1 col2 col3 col4 col5 16
col1 col2 col3 col4 col5+1 1

where there last column counts from 1 to 16 then resets back to one. At this point 1 is added to column 5.

A clean output would just iterate up until the end of the file. How can I find missing data, e.g.

col1 col2 col3 col4 col5 1
col1 col2 col3 col4 col5 3

where a row has been skipped/lost as can be seen from the last column that has skipped the value of 2?

I'd like the line number/location of the line before or after the missing data as the desired output.

This answer on Stack Overflow gave me the idea to use awk. So what I've come up with is:

awk '$6!=p+1{print NR}{p=$6}'

To try and print the current line number when column 6 of the current line is not equal to column 6 of the last line +1. This fails due to the looping nature of getting to 16 and going back to 1.

1 Answer 1

6
$ cat -n file
 1  col1 col2 col3 col4 col5 14
 2  col1 col2 col3 col4 col5 15
 3  col1 col2 col3 col4 col5 16
 4  col1 col2 col3 col4 col5 1
 5  col1 col2 col3 col4 col5 2
 6  col1 col2 col3 col4 col5 15
 7  col1 col2 col3 col4 col5 16
 8  col1 col2 col3 col4 col5 4
 9  col1 col2 col3 col4 col5 5

$ awk '{if (p % 16 + 1 != $6) printf("line %d is bad: %s\n", NR, $0); p=$6}' file
line 1 is bad: col1 col2 col3 col4 col5 14
line 6 is bad: col1 col2 col3 col4 col5 15
line 8 is bad: col1 col2 col3 col4 col5 4

To understand the value of the modulo operator "%" (division remainder), you may play around with this awk snippet:

$ yes | head -n 40 | awk '{x=NR-1; print x, "->", x % 16}'
0 -> 0
1 -> 1
2 -> 2
[...]
14 -> 14
15 -> 15
16 -> 0
17 -> 1
18 -> 2
[...]
6
  • That works like a charm. How does it work? (Specifically the p % 16 bit) Commented Oct 25, 2016 at 15:34
  • regarding "%" google "modulo operator", it's not only awk specific: en.wikipedia.org/wiki/Modulo_operation Commented Oct 25, 2016 at 15:38
  • @Christopher If you worry about % use ++p != $6 ... p=$6<16?$6:0}' Commented Oct 25, 2016 at 16:02
  • @Costas This could be wrong if you have numbers >=17. It would say 1 after 17 is correct. Commented Oct 25, 2016 at 16:13
  • Firstly 17 after any number (even after 16) is incorrect. And up to you which number after 17 is correct? 2? Commented Oct 25, 2016 at 16:24

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.