1

I would like to remove duplicate lines in a file (duplicates of column 2) keeping the complete first line for each duplicate.

Example input:

10.4.14.1,201s-1-S
10.4.16.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.18.1,201s-1-S
10.4.19.1,201s-1-S
10.4.20.1,201s-1-S
10.4.21.1,201s-1-S
10.4.22.1,201s-1-S
10.4.23.1,201s-1-S
10.4.24.1,MDF-S

Desired result:

10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

So far I have tried

awk '!k[$5]++' file

and

awk '!_[$5]++' file

but this does not yield my desired output.

4 Answers 4

3

using a perl one-liner

perl -aF, -lne 'print if ! $seen{$F[1]}++' data.txt

Outputs:

10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

Explanation:

Switches:

  • -a: Splits the line on space and loads them in an array @F
  • -F/pattern/: split() pattern for -a switch (//'s are optional)
  • -l: Enable line ending processing
  • -n: Creates a while(<>){..} loop for each line in your input file.
  • -e: Tells perl to execute the code on command line.
Sign up to request clarification or add additional context in comments.

1 Comment

Unless you hate using unless which means not doing perl -F, -lane 'print unless $seen{$F[1]}++' data.txt you can also do perl -F, -lane '$seen{$F[1]}++||print' data.txt. :P
3

You need to set the delimiter to , (the default delimiter is whitespace) and use the correct column ($2) for the "seen" array.

$ awk -F, '!seen[$2]++' file
10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

Comments

1

You could also use sort for this:

$ sort -t, -k2 -u file
10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

Comments

0

This might work for you (GNU sed):

sed -rn '1!G;/^[^,]*(,[^\n]*)\n.*\1/!P;h' file

If the second field in the current line is not a duplicate print the current line.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.