Copy contents of one file to another using sed

Question

I want to copy all text from one file into another file using sed on Alma Linux 8. For example, the first file, old.txt, is:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5 
...

And the other file, new.txt is:

192.168.0.1
192.168.0.2
192.168.0.3

I want to copy entries from old.txt into new.txt but without any duplicates. Expected output in new.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5

How can I do this?

Why do you want to use sed in particular? Is there something wrong with the other standard tools? — Toby Speight
– Toby Speight, Commented Oct 8, 2023 at 18:28

Chris Davies · Accepted Answer · 2023-10-08 12:06:21Z

10

No need for anything except a straightforward sort with a uniqueness constraint.

sort -u old.txt new.txt >new.txt.tmp &&
     mv -f new.txt.tmp new.txt
rm -f new.txt.tmp

I see that POSIX does define the ability for sort to write directly to an input file, so you could also do this, but I haven't tested to see how robust it is in the event of failure (the previous version guarantees either to keep the original or replace it with the new, without loss):

sort -o new.txt -u old.txt new.txt

Alternatively you could use awk. This version keeps the order of the lines in the files intact, starting with the first file and adding only new lines from subsequent files:

awk '
    FNR<NR && h[$0] { next }    # Skip seen lines in secondary files
    { h[$0]=1; print }          # Record the line and output it
' new.txt old.txt

I've split this over several lines so I can add comments. Remove those and you could crash it into a single line, but it's generally better to write readable code. It intentionally does not remove duplicate lines already present in the first file, new.txt. If you want that,

awk '! h[$0]++' new.txt old.txt

This increments an associative array value for each line seen, but prints it only if the value was zero (unset).

edited Oct 8, 2023 at 12:06

answered Oct 8, 2023 at 7:55

Chris Davies

128k16 gold badges178 silver badges323 bronze badges

Thanks very help full

Abdullah
– Abdullah

2023-10-08 08:25:26 +00:00
Commented Oct 8, 2023 at 8:25
I'm pretty sure that the robustness of overwriting input is a trade-off - once it's truncated, you lose the original content. But on the other hand, it's less likely to fail with ENOSPC than the version where input and output have to exist concurrently.

Toby Speight
– Toby Speight

2023-10-08 18:36:58 +00:00
Commented Oct 8, 2023 at 18:36
@TobySpeight if my first solution fails with ENOSPC then the mv (rename) will be aborted and the original file left in place. This works because the rename(2) syscall is atomic and guarantees either the original or the replacement will be delivered. Given that sort uses temporaries in (IIRC) /var/tmp I don't think the same can be said if it creates the output file itself from its sorted input

Chris Davies
– Chris Davies

2023-10-08 19:38:32 +00:00
Commented Oct 8, 2023 at 19:38
Yes, exactly. It's (slightly) more likely to fail, but if it does, it fails in a more recoverable way.

Toby Speight
– Toby Speight

2023-10-09 06:48:25 +00:00
Commented Oct 9, 2023 at 6:48
@Abdullah if one of these answers works best for you please remember to accept it ✓

Chris Davies
– Chris Davies

2023-10-10 23:17:43 +00:00
Commented Oct 10, 2023 at 23:17

Add a comment |

Romeo Ninov · Accepted Answer · 2023-10-08 12:32:29Z

3

Based on other answer you can use sort command but you can explicitly mention output file and do the sort in place (w/o intermediate file)

sort -u -o new.txt old.txt new.txt

One of possible awk ways:

awk -i inplace '{z[$0]=1} END{for( i in z) print i}' new.txt old.txt

Be warned that this can "eat" a lot of memory if one or both input files are long. inplace will keep the output in new.txt

edited Oct 8, 2023 at 12:32

answered Oct 8, 2023 at 8:30

Romeo Ninov

19.5k5 gold badges34 silver badges48 bronze badges

1

See also the -V option (GNU or compatible) or sort -nt . -k1,1 -k2,2 -k3,3 -k4,4 (standard) to sort those quad-decimal IPv4 addresses numerically.

Stéphane Chazelas
– Stéphane Chazelas

2023-10-08 11:56:26 +00:00
Commented Oct 8, 2023 at 11:56

Add a comment |

Stéphane Chazelas · Accepted Answer · 2023-10-08 12:51:16Z

3

That's not really a job for sed, but if you have to use sed and your sed happens to be GNU sed, you could do something like:

sed -E '
  :1; $!{N;b1}
  :2; s/^((.*)$(.|\n)*)\n\2$/\1/m; t2
  ' new.txt old.txt

Where:

:1; $!{N;b1} loads the whole input (both files in the pattern space) in a loop.
:2; s/^((.*)$(.|\n)*)\n\1$/\1/m; t2 removes duplicated lines in that pattern space in a loop.

To modify new.txt in-place, you could do:

sed -nE '
  :1; $!{N;b1}
  :2; s/^((.*)$(.|\n)*)\n\1$/\1/m; t2
  w new.txt
  ' new.txt old.txt

edited Oct 8, 2023 at 12:51

answered Oct 8, 2023 at 12:45

Stéphane Chazelas

585k96 gold badges1.1k silver badges1.7k bronze badges

Add a comment |

Stéphane Chazelas · Accepted Answer · 2023-10-08 11:58:20Z

1

Here's another way:

grep -vFxf new.txt old.txt >> new.txt

The -v flag tells grep to print lines that do NOT match the pattern given, and the -f new.txt tells grep to use the lines in the file new.txt as the patterns to search for. With -F, the patterns are Fixed strings instead of basic regular expressions and with -x, they have to match the full lines. So grep -vFxf new.txt old.txt means "show me all lines in old.txt that are missing from new.txt".

We append those lines to new.txt using the >> redirection operator. Note that new.txt is used here both as input and as output, but grep will read the full initial contents of new.txt to build the list of strings to search and stop reading it after that, so extra lines printed afterwards to new.txt won't be read again.

edited Oct 8, 2023 at 11:58

Stéphane Chazelas

585k96 gold badges1.1k silver badges1.7k bronze badges

answered Oct 8, 2023 at 11:34

terdon♦

252k69 gold badges480 silver badges718 bronze badges

2

@stéphane thanks. Um. I have absolutely no idea why I used tee -a instead of the obvious >>. The only thing I can think of is that I have only had one coffee today...

terdon
– terdon ♦

2023-10-08 11:57:00 +00:00
Commented Oct 8, 2023 at 11:57

Add a comment |

jubilatious1 · Accepted Answer · 2023-10-11 19:45:09Z

Using Raku (formerly known as Perl_6)

~$ raku -e '.put for lines.unique;'  old.txt new.txt  > tmp.txt

Raku will read files off the commandline using the workhorse lines routine. So two files can be read-in, and > redirected into a tmp.txt file. Then once you verify your result, you can copy the tmp.txt file back over new.txt.

Raku does not do "in-place" editing (i.e. there's no Raku equivalent of Perl's -i command line flag).

Sample Input old.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5

Sample Input new.txt:

192.168.0.1
192.168.0.2
192.168.0.3

Sample Output tmp.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5

A nice aspect of Raku's unique routine is that it takes an as parameter, meaning that only a portion of the input line gets used in the unique comparison. Such an option allows (for example), recovery of unique file names from complete paths, discarding duplicate names in different directories. See the first SO link below for details.

https://unix.stackexchange.com/a/720574/227738
https://docs.raku.org/routine/unique
https://raku.org

I'll edit my answer.

jubilatious1
– jubilatious1

2023-10-09 10:21:46 +00:00
Commented Oct 9, 2023 at 10:21 — jubilatious1
– jubilatious1, Commented Oct 9, 2023 at 10:21

Stack Exchange Network

Copy contents of one file to another using sed

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

Copy contents of one file to another using sed

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions