1

I want to copy all text from one file into another file using sed on Alma Linux 8. For example, the first file, old.txt, is:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5 
...

And the other file, new.txt is:

192.168.0.1
192.168.0.2
192.168.0.3

I want to copy entries from old.txt into new.txt but without any duplicates. Expected output in new.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5 

How can I do this?

1
  • 8
    Why do you want to use sed in particular? Is there something wrong with the other standard tools? Commented Oct 8, 2023 at 18:28

5 Answers 5

10

No need for anything except a straightforward sort with a uniqueness constraint.

sort -u old.txt new.txt >new.txt.tmp &&
     mv -f new.txt.tmp new.txt
rm -f new.txt.tmp

I see that POSIX does define the ability for sort to write directly to an input file, so you could also do this, but I haven't tested to see how robust it is in the event of failure (the previous version guarantees either to keep the original or replace it with the new, without loss):

sort -o new.txt -u old.txt new.txt

Alternatively you could use awk. This version keeps the order of the lines in the files intact, starting with the first file and adding only new lines from subsequent files:

awk '
    FNR<NR && h[$0] { next }    # Skip seen lines in secondary files
    { h[$0]=1; print }          # Record the line and output it
' new.txt old.txt

I've split this over several lines so I can add comments. Remove those and you could crash it into a single line, but it's generally better to write readable code. It intentionally does not remove duplicate lines already present in the first file, new.txt. If you want that,

awk '! h[$0]++' new.txt old.txt

This increments an associative array value for each line seen, but prints it only if the value was zero (unset).

5
  • Thanks very help full Commented Oct 8, 2023 at 8:25
  • I'm pretty sure that the robustness of overwriting input is a trade-off - once it's truncated, you lose the original content. But on the other hand, it's less likely to fail with ENOSPC than the version where input and output have to exist concurrently. Commented Oct 8, 2023 at 18:36
  • @TobySpeight if my first solution fails with ENOSPC then the mv (rename) will be aborted and the original file left in place. This works because the rename(2) syscall is atomic and guarantees either the original or the replacement will be delivered. Given that sort uses temporaries in (IIRC) /var/tmp I don't think the same can be said if it creates the output file itself from its sorted input Commented Oct 8, 2023 at 19:38
  • Yes, exactly. It's (slightly) more likely to fail, but if it does, it fails in a more recoverable way. Commented Oct 9, 2023 at 6:48
  • @Abdullah if one of these answers works best for you please remember to accept it ✓ Commented Oct 10, 2023 at 23:17
3

Based on other answer you can use sort command but you can explicitly mention output file and do the sort in place (w/o intermediate file)

sort -u -o new.txt old.txt new.txt  

One of possible awk ways:

awk -i inplace '{z[$0]=1} END{for( i in z) print i}' new.txt old.txt

Be warned that this can "eat" a lot of memory if one or both input files are long. inplace will keep the output in new.txt

1
  • 1
    See also the -V option (GNU or compatible) or sort -nt . -k1,1 -k2,2 -k3,3 -k4,4 (standard) to sort those quad-decimal IPv4 addresses numerically. Commented Oct 8, 2023 at 11:56
3

That's not really a job for sed, but if you have to use sed and your sed happens to be GNU sed, you could do something like:

sed -E '
  :1; $!{N;b1}
  :2; s/^((.*)$(.|\n)*)\n\2$/\1/m; t2
  ' new.txt old.txt

Where:

  • :1; $!{N;b1} loads the whole input (both files in the pattern space) in a loop.
  • :2; s/^((.*)$(.|\n)*)\n\1$/\1/m; t2 removes duplicated lines in that pattern space in a loop.

To modify new.txt in-place, you could do:

sed -nE '
  :1; $!{N;b1}
  :2; s/^((.*)$(.|\n)*)\n\1$/\1/m; t2
  w new.txt
  ' new.txt old.txt
1

Here's another way:

grep -vFxf new.txt old.txt >> new.txt 

The -v flag tells grep to print lines that do NOT match the pattern given, and the -f new.txt tells grep to use the lines in the file new.txt as the patterns to search for. With -F, the patterns are Fixed strings instead of basic regular expressions and with -x, they have to match the full lines. So grep -vFxf new.txt old.txt means "show me all lines in old.txt that are missing from new.txt".

We append those lines to new.txt using the >> redirection operator. Note that new.txt is used here both as input and as output, but grep will read the full initial contents of new.txt to build the list of strings to search and stop reading it after that, so extra lines printed afterwards to new.txt won't be read again.

1
  • 2
    @stéphane thanks. Um. I have absolutely no idea why I used tee -a instead of the obvious >>. The only thing I can think of is that I have only had one coffee today... Commented Oct 8, 2023 at 11:57
0

Using Raku (formerly known as Perl_6)

~$ raku -e '.put for lines.unique;'  old.txt new.txt  > tmp.txt

Raku will read files off the commandline using the workhorse lines routine. So two files can be read-in, and > redirected into a tmp.txt file. Then once you verify your result, you can copy the tmp.txt file back over new.txt.

Raku does not do "in-place" editing (i.e. there's no Raku equivalent of Perl's -i command line flag).

Sample Input old.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5 

Sample Input new.txt:

192.168.0.1
192.168.0.2
192.168.0.3

Sample Output tmp.txt:

192.168.0.1
192.168.0.2
192.168.0.3
192.168.0.4
192.168.0.5 

A nice aspect of Raku's unique routine is that it takes an as parameter, meaning that only a portion of the input line gets used in the unique comparison. Such an option allows (for example), recovery of unique file names from complete paths, discarding duplicate names in different directories. See the first SO link below for details.

https://unix.stackexchange.com/a/720574/227738
https://docs.raku.org/routine/unique
https://raku.org

1
  • I'll edit my answer. Commented Oct 9, 2023 at 10:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.