Skip to main content
Edited for correct expected output
Source Link
kris
  • 209
  • 3
  • 5

I have a file tmp.log with fields like

description ID  valueA valueB valueC
xxx         x    1       1     1
yyy         y    3       100    23
zzz         z    0       0      0
aaa         a    4       4      4

I would like to remove data points which have same values across all 'value' columns

description ID  valueA valueB valueC
yyy         y    3       100    23
aaa         a    4       4      4

I am using

cat tmp.log | tail -n+2 | awk '!a[$3$4$5]++'

But it still prints the redundant values, why is this wrong and how to correct?

I have a file tmp.log with fields like

description ID  valueA valueB valueC
xxx         x    1       1     1
yyy         y    3       100    23
zzz         z    0       0      0
aaa         a    4       4      4

I would like to remove data points which have same values across all 'value' columns

description ID  valueA valueB valueC
yyy         y    3       100    23
aaa         a    4       4      4

I am using

cat tmp.log | tail -n+2 | awk '!a[$3$4$5]++'

But it still prints the redundant values, why is this wrong and how to correct?

I have a file tmp.log with fields like

description ID  valueA valueB valueC
xxx         x    1       1     1
yyy         y    3       100    23
zzz         z    0       0      0
aaa         a    4       4      4

I would like to remove data points which have same values across all 'value' columns

description ID  valueA valueB valueC
yyy         y    3       100    23

I am using

cat tmp.log | tail -n+2 | awk '!a[$3$4$5]++'

But it still prints the redundant values, why is this wrong and how to correct?

edited tags
Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 265
Became Hot Network Question
Source Link
kris
  • 209
  • 3
  • 5

How to remove duplicate values based on multiple columns

I have a file tmp.log with fields like

description ID  valueA valueB valueC
xxx         x    1       1     1
yyy         y    3       100    23
zzz         z    0       0      0
aaa         a    4       4      4

I would like to remove data points which have same values across all 'value' columns

description ID  valueA valueB valueC
yyy         y    3       100    23
aaa         a    4       4      4

I am using

cat tmp.log | tail -n+2 | awk '!a[$3$4$5]++'

But it still prints the redundant values, why is this wrong and how to correct?