0

I have a text file containing 2 columns. The first one has DATES (DD/MM/YYYY) and the second one, numbers. It looks like this:

15/01/1945 105.0
16/01/1945   4.2
17/01/1945   3.0
31/01/1945  12.0
01/02/1945   3.0
02/02/1945 125.0
05/02/1945   0.3

And I need to fill the file with this conditions:

  1. First date 01/01/1945
  2. Last date 31/12/2021
  3. Dates must be consecutive, with a difference of one day between lines
  4. If there is a missing date, we have to complete the line with the correct date and the number -99.0

So, the final file should look like this:

01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
06/01/1945 -99.0
07/01/1945 -99.0
08/01/1945 -99.0
09/01/1945 -99.0
10/01/1945 -99.0
11/01/1945 -99.0
12/01/1945 -99.0
13/01/1945 -99.0
14/01/1945 -99.0
15/01/1945 105.0
16/01/1945   4.2
17/01/1945   3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945  12.0
01/02/1945   3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945   0.3
06/02/1945 -99.0
07/02/1945 -99.0
...
30/12/2021 -99.0
31/12/2021 -99.0

I have tried by using a fortran program but it doesn't work. I think that maybe using awk or sed or both.

This is what I get when I read Ed's script:

meteo@poniente:/datos$ cat awk.script
#!/bin/bash
cat tst.awk
awk { dates2vals[$1] = $2 }
END {
    begDate = "01/01/1945"
    endDate = "31/12/2000"
    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
    daySecs = 24 * 60 * 60
    for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
        curDate = strftime("%d/%m/%Y",curSecs)
        print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
    }
}

And this is what I get when I run Ed's script:

meteo@poniente:/datos$ ./tst.awk
01/01/1946   3.0
02/01/1946  14.2
...
14/11/2021   0.0
15/11/2021   0.0
16/11/2021   0.0
17/11/2021   0.0
18/11/2021   0.0
19/11/2021   0.0
20/11/2021   0.0
21/11/2021   0.0
22/11/2021  54.1
23/11/2021 -99.0
24/11/2021  27.4
25/11/2021   0.0
29/11/2021   0.0
30/11/2021   0.0
awk: li­ne ord.:1: {
awk: line ord.:1:  ^ unexpected newline or end of string
./awk.script: li­ne 4: END: command not found
./awk.script: li­ne 5: begDate: command not found
./awk.script: li­ne 6: endDate: command not found
./awk.script: li­ne 7: syntax error near unexpected element `('
./awk.script: li­ne 7: `    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))'
meteo@poniente:/datos$
3
  • Could you possibly fix your question and use code formatting, so we can more easily copy-paste your examples? Commented Feb 11, 2022 at 11:57
  • are the dates in the input always consecutive ? Commented Feb 11, 2022 at 11:59
  • Yes, they are always consecutives, but there are missing dates. Files are always in order and the oldest date is always the first. Commented Feb 11, 2022 at 12:42

2 Answers 2

3

Try creating your long list, using seq (in epoch seconds: start, delta=1day, end) and date's -f option, with the default values -99.0, then replace where possible with awk:

seq -f"@%.0f" -- -788878800 86400 1640905200 | date -uf- +"%d/%m/%Y -99.0" | awk 'FNR==NR {A[$1] = $2; next} $1 in A {$2 = A[$1]} 1' file - 
01/01/1945 -99.0
02/01/1945 -99.0
.
.
.

14/01/1945 -99.0
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
18/01/1945 -99.0
19/01/1945 -99.0
20/01/1945 -99.0
21/01/1945 -99.0
22/01/1945 -99.0
23/01/1945 -99.0
24/01/1945 -99.0
25/01/1945 -99.0
26/01/1945 -99.0
27/01/1945 -99.0
28/01/1945 -99.0
29/01/1945 -99.0
30/01/1945 -99.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
03/02/1945 -99.0
04/02/1945 -99.0
05/02/1945 0.3
06/02/1945 -99.0
07/02/1945 -99.0
08/02/1945 -99.0
09/02/1945 -99.0
10/02/1945 -99.0
.
.
.
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0
4
  • Not sure what you mean. How does RudiC's command not work? Commented Feb 11, 2022 at 12:43
  • To edit your file in place, you need to add ` > file.tmp && mv file.tmp file`. Commented Feb 11, 2022 at 12:44
  • Hi, thanks!! I already had the a file with the long list that is LIST.DAT. In that file a have, line by line, dates between 01/01/1945 and 31/12/2021. How can I use the awk order supossing that I have these 2 fles (LIST.DAT and INPUT.txt) and writing -99.0 if there is a missing date in the input file? Commented Feb 11, 2022 at 12:58
  • Note that -f (that accepts a file with list of input dates; some other date implementations have a -f option for something else) is a GNU-specific date extension. Commented Feb 11, 2022 at 12:59
1

Using GNU awk for time functions:

$ cat tst.awk
{ dates2vals[$1] = $2 }
END {
    begDate = "01/01/1945"
    endDate = "31/12/2021"
    begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate))
    daySecs = 24 * 60 * 60
    for (curSecs=begSecs; curDate!=endDate; curSecs+=daySecs) {
        curDate = strftime("%d/%m/%Y",curSecs)
        print curDate, (curDate in dates2vals ? dates2vals[curDate] : "-99.0")
    }
}

$ awk -f tst.awk file | wc -l
28124
$ awk -f tst.awk file | head -5
01/01/1945 -99.0
02/01/1945 -99.0
03/01/1945 -99.0
04/01/1945 -99.0
05/01/1945 -99.0
$ awk -f tst.awk file | tail -5
27/12/2021 -99.0
28/12/2021 -99.0
29/12/2021 -99.0
30/12/2021 -99.0
31/12/2021 -99.0
$ awk -f tst.awk file | grep -v '99.0'
15/01/1945 105.0
16/01/1945 4.2
17/01/1945 3.0
31/01/1945 12.0
01/02/1945 3.0
02/02/1945 125.0
05/02/1945 0.3
10
  • Hi, thanks! I have tried your awk command, but I get and error: Syntax error: `(' unexpected in line begSecs = mktime(gensub("(.*)/(.*)/(.*)","\\3 \\2 \\1 12 00 00",1,begDate),1) Commented Feb 11, 2022 at 14:40
  • If you don't tell me what the error is then I can't help you debug it. Please copy/paste the error message into a comment and also tell me the output of awk --version. My best guess is you aren't using GNU awk as I said is required. Commented Feb 11, 2022 at 14:41
  • Sorry. My awk version is GNU Awk 3.1.6. It is an old computer with no Internet connection Commented Feb 11, 2022 at 14:50
  • I have checked again, and RudiC's command works perfectly!! Thanks all of you!! Commented Feb 11, 2022 at 15:21
  • @David You're welcome. Yea, your version of gawk is 13 years out of date, we're now on version 5.1.1. My script should still work though as mktime(), strftime(), and gensub() (the 3 gawk extension functions I used) were all introduced in 3.1 but maybe you'd just have to remove the final ,1 utc-flag from the calls to mktime() and/or strftime() that I always add out of habit but you don't need for your case. I edited my answer to remove those flags, if that doesn't fix it then idk what the problem could be unless you copy/pasted wrong. Commented Feb 11, 2022 at 18:17

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.