0

I have a data gathering device which records per second. Sometimes, randomly it drops a second or more. For example

166,1.09424,240,76,132
168,1.10088,215,76,132
169,1.10765,213,78,131
170,1.11458,198,79,131

It can gather 1M data points in a session, and might miss >100

Looking around I found this AWK command

awk '{ while (NR + shift < $1) { print (NR + shift) " NA"; shift++ }; print } END { shift++; while (NR + shift < 13) { print (NR + shift) " NA"; shift++ } }' /tmp/test1

from this answer

but that gives new lines when they aren't needed:

166 NA
167 NA
168 NA
169 NA
170 NA

What am I doing wrong?

0

1 Answer 1

1

A few issues with your command line:

  1. Your CSV uses comma delimiters, but your awk command assumes the default space delimiter. This is the primary reason you get unnecessary fills.
  2. The END case in your command exists only to fill the end result to a total minimum of 12 lines. This should therefore be removed, or the test should be changed to a higher minimum, perhaps user-specified.
  3. Most tools used to process CSV don't like lines with varying numbers of fields. Your padding lines should therefore contain the same number of fields as normal lines.

Taken all together, and assuming your first record starts with second 1, we have this:

awk -F, -v min_secs=2000 '
{ 
  while (NR + shift < $1) {
    print (NR + shift) ",NA,NA,NA,NA"
    shift++ 
  }
  print
}
END {
  while (NR + shift <= min_secs) {
    print (NR + shift) ",NA,NA,NA,NA"
    shift++ 
  }
}
' /tmp/test1
8
  • I tried to make it into one line awk -F, -v min_secs=2000 ' {while (NR + shift < $1) {print (NR + shift) ",NA,NA,NA,NA" shift++}print} END {while (NR + shift <= min_secs) {print (NR + shift) ",NA,NA,NA,NA" shift++}} ' wrong.txt > fixed.txt and then ran it, however it didn't add the new line 166,1.09424,240,76,132 168,1.10088,215,76,132 169,1.10765,213,78,131 170,1.11458,198,79,131 but it also added a bunch at the end - as a test I used a small sample file that stops at 300 lines 300,2.08307,331,58,111 301,2.08754,361,60,112 301,NA,NA,NA,NA1 302,NA,NA,NA,NA2 Commented Feb 16, 2019 at 2:57
  • Oh that doesn't format very well - you can see I'm a noob! Commented Feb 16, 2019 at 2:58
  • @SteveShiny If you're squeezing it all into a one-liner (which I don't recommend for clarity reasons), you have to put the semicolons between the print and shift++ statements back in. That's why you're seeing increasing serial numbers at the end of each dummy line: print thinks shift++ is another argument to it, instead of an entirely separate statement. Commented Feb 16, 2019 at 3:02
  • I'm doing something wrong. I copied and pasted the command, and got the same result. I'm on Mac if that makes a difference? Commented Feb 16, 2019 at 3:39
  • Actually I partially take that back... it found one missing point and added a line for that, but not another 223,1.59792,373,68,110 224,1.60427,373,66,109 225,NA,NA,NA,NA 226,1.61923,373,63,110 227,1.62166,373,60,110 165,1.08769,223,77,131 166,1.09424,240,76,132 168,1.10088,215,76,132 169,1.10765,213,78,131<br/>grrr how to do new lines? Commented Feb 16, 2019 at 3:43

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.