Remove duplicates from column

Question

I would like to remove duplicate lines in a file (duplicates of column 2) keeping the complete first line for each duplicate.

Example input:

10.4.14.1,201s-1-S
10.4.16.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.18.1,201s-1-S
10.4.19.1,201s-1-S
10.4.20.1,201s-1-S
10.4.21.1,201s-1-S
10.4.22.1,201s-1-S
10.4.23.1,201s-1-S
10.4.24.1,MDF-S

Desired result:

10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

So far I have tried

awk '!k[$5]++' file

and

awk '!_[$5]++' file

but this does not yield my desired output.

Miller · Accepted Answer · 2014-06-19 19:39:58Z

3

using a perl one-liner

perl -aF, -lne 'print if ! $seen{$F[1]}++' data.txt

Outputs:

10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

Explanation:

Switches:

-a: Splits the line on space and loads them in an array @F
-F/pattern/: split() pattern for -a switch (//'s are optional)
-l: Enable line ending processing
-n: Creates a while(<>){..} loop for each line in your input file.
-e: Tells perl to execute the code on command line.

edited Jun 19, 2014 at 19:39

answered Jun 19, 2014 at 17:27

Miller

35.3k4 gold badges42 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jaypal singh Over a year ago

Unless you hate using unless which means not doing perl -F, -lane 'print unless $seen{$F[1]}++' data.txt you can also do perl -F, -lane '$seen{$F[1]}++||print' data.txt. :P

Adrian Frühwirth · Accepted Answer · 2014-06-19 17:46:51Z

3

You need to set the delimiter to , (the default delimiter is whitespace) and use the correct column ($2) for the "seen" array.

$ awk -F, '!seen[$2]++' file
10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

edited Jun 19, 2014 at 17:46

answered Jun 19, 2014 at 17:36

Adrian Frühwirth

46.1k10 gold badges63 silver badges71 bronze badges

Comments

Chris Seymour · Accepted Answer · 2014-06-19 20:49:18Z

1

You could also use sort for this:

$ sort -t, -k2 -u file
10.4.14.1,201s-1-S
10.4.17.1,40-MDF-S
10.4.24.1,MDF-S

answered Jun 19, 2014 at 20:49

Chris Seymour

86.4k32 gold badges165 silver badges209 bronze badges

Comments

potong · Accepted Answer · 2014-06-21 22:06:05Z

0

This might work for you (GNU sed):

sed -rn '1!G;/^[^,]*(,[^\n]*)\n.*\1/!P;h' file

If the second field in the current line is not a duplicate print the current line.

answered Jun 21, 2014 at 22:06

potong

59.3k6 gold badges55 silver badges92 bronze badges

Collectives™ on Stack Overflow

Remove duplicates from column

4 Answers 4

Explanation:

1 Comment

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Explanation:

1 Comment

Comments

Comments

Comments

Related