3

I have file1.csv

"word 1"
""
"word 3"
""
"word 5"
"word 6"

and file2.csv

"replacement text 1"
"replacement text 2"
"replacement text 3"
"replacement text 4"
"replacement text 5"
"replacement text 6"

I'm looking for a command that checks where there are empty lines (or lines with "") in file1 and then replaces them with the contents of file2.

The output.csv should be

"word 1"
"replacement text 2"
"word 3"
"replacement text 4"
"word 5"
"word 6"

5 Answers 5

5

This is assuming that the files have the same number of lines: Using paste to create a CSV record stream with the fields from the first file as the first header-less column and the fields from the second file as the second header-less column:

$ paste -d , file1.csv file2.csv
"word 1","replacement text 1"
"","replacement text 2"
"word 3","replacement text 3"
"","replacement text 4"
"word 5","replacement text 5"
"word 6","replacement text 6"

We may then use Miller to update the first field with the values from the second field if the first field is empty:

$ paste -d , file1.csv file2.csv| mlr --csv -N put 'is_empty($1) { $1 = $2 }'
word 1,replacement text 1
replacement text 2,replacement text 2
word 3,replacement text 3
replacement text 4,replacement text 4
word 5,replacement text 5
word 6,replacement text 6

The is_empty() test will be true for any field that is empty, regardless of whether it was quoted or not in the input.

We may then cut (extract) the first field:

$ paste -d , file1.csv file2.csv| mlr --csv -N put 'is_empty($1) { $1 = $2 }' then cut -f 1
word 1
replacement text 2
word 3
replacement text 4
word 5
word 6

Miller will only quote fields that actually need quoting. To force Miller to quote all output fields, use --quote-all:

$ paste -d , file1.csv file2.csv| mlr --csv -N --quote-all put 'is_empty($1) { $1 = $2 }' then cut -f 1
"word 1"
"replacement text 2"
"word 3"
"replacement text 4"
"word 5"
"word 6"

You can definitely do something similar with awk, but remember that awk is not CSV-aware, will treat the double quotes as literal text, and will therefore blindly treat each comma as a delimiter, even if they are embedded in properly quoted fields. It also doesn't understand fields with embedded newlines, but our initial assumption already rules these out.

$ paste -d , file1.csv file2.csv| awk -F , '$1 == "\"\"" { $1 = $2 } { print $1 }'
"word 1"
"replacement text 2"
"word 3"
"replacement text 4"
"word 5"
"word 6"
3
  • What is ‘Miller’? (Whatever it is, it doesn't seem to be installed on any of the Unix boxes I have access to…) Commented Jan 28, 2023 at 18:38
  • This is a very good tool. See miller.readthedocs.io/en/latest Commented Jan 28, 2023 at 18:57
  • 1
    @gidds It's a tool for processing various types of structured data in different ways. I added a link in the answer and Prabhjot mentioned in a previous comment too. I would be surprised to see it installed by default, yes, but it's common to install tools that you need to use, just like one might install any other type of application that is necessary for completing a task. Commented Jan 28, 2023 at 19:33
4

Another awk alternative:

awk '{getline other < "file2.csv"}
     $0 == "\"\"" {$0 = other}
     {print}' file1.csv > output.csv

Or paste + sed:

paste -d '\n' file1.csv file2.csv| sed 'N;s/^""\n//;s/\n.*//' > output.csv

If file2.csv doesn't have enough lines to satisfy all "" lines of file1.csv, with the awk one, it will reuse the last line of file2.csv while with the paste+sed one, you'll get empty lines. You'll also get extra empty lines there if file2.csv has more lines than file1.csv.

3
$ awk -F, 'NR == FNR    { a[FNR] = $1; next };
           $1 == "\"\"" { $1 = a[FNR] };
           1' file2.csv file1.csv 
"word 1"
"replacement text 2"
"word 3"
"replacement text 4"
"word 5"
"word 6"

This reads in file2.csv and stores column 1 of each line in an array.

Then it reads in file1.csv and, if column 1 is just a pair of double-quotes (i.e. "empty"), replaces column one with the appropriate element of the array (FNR, or the current line number of the current file). Then it prints the current line, whether it has been changed or not.

1

Using csvsql:

$ paste -d, file1 file2 |\
> csvsql -H --query 'SELECT CASE WHEN a IS NOT NULL THEN a ELSE b END as output from STDIN' |\
> csvformat -K 1 -U1
"word 1"
"replacement text 2"
"word 3"
"replacement text 4"
"word 5"
"word 6"

First csv stream is created by paste command.

csvsql uses sqlite dialect. -H is used because files are headerless. -H sets name of first and second fields to a and b respectively. Thereafter CASE WHEN a IS NOT NULL THEN a ELSE b END prints first field when first field is not empty and second filed otherwise. And adding header output as we specified.

csvformat is used to format output. -K 1 deletes first record ( header added as output in last command). -U 1 quoting style to quote all.

0

The following Perl script assumes the original input (file1.csv) is stdin, and the replacement files are passed as command-line arguments.

#!/usr/bin/perl
while (<STDIN>) {
  $_ = <> if /^("")?$/;
  print;
}

For example, any of these:

$ cat file1.csv | perl this-script.pl file2.csv
$ <file1.csv | perl this-script.pl file2.csv
$ ./this-script.pl file2.csv <file1.csv

Or, as a one liner:

$ perl -e 'while (<STDIN>) { $_ = <> if /^("")?$/; print }' <file1.csv file2.csv

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.