Bash - Remove duplicates preserve order [duplicate]

Question

i have a file that looks like

1254543534523233434
3453453454323233434
2342342343223233535
0909909092324243535

Is there a way / command in bash to remove duplicates on the file above, based on a specific substring, without changing their order in the output?

ie

(With substring -> ${line:11:8}

1254543534523233434
2342342343223233535
0909909092324243535

I know that :

sort -u : sorts them numerically, then removes duplicates
sort -kx,x -u : The same
cat filein | uniq : requires them to be sorted already or it will not work

I m trying to figure out if there's a native linux solution without having to resolve to perl code for it. Thank you in advance.

This is not an exact duplicate. It has the additional constraint of comparing lines based only on a subtring, but printing the complete line. However, the answer should be easily extendible to awk '!seen[substr($0, 11, 8)]++' file.txt. — Martin Nyolt
– Martin Nyolt, Commented Aug 22, 2016 at 9:56
This isn't a duplicate; answers referenced avoid sorting, but they don't preserve order — philwalk
– philwalk, Commented Nov 18, 2023 at 18:07
Should not be closed ; not a duplicate in any way ; order must be preserved — philwalk
– philwalk, Commented Nov 18, 2023 at 18:13

anubhava · Accepted Answer · 2016-08-22 09:59:26Z

9

You can use awk without any need to sorting:

awk '!uniq[substr($0, 12, 8)]++' file

1254543534523233434
2342342343223233535
0909909092324243535

Since awk index starts from 1 you need to use substr($0, 12, 8) to get desired 8 characters long text starting from 12th position.
uniq is an associative array with substring retrieved using substr function.
++ sets value of array as 1

answered Aug 22, 2016 at 9:59

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

onlyf Over a year ago

This worked perfectly, thank you.

Collectives™ on Stack Overflow

Bash - Remove duplicates preserve order [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related