12

How can I find two concatenated repeated lines in files?

For example, in this file we have only two concatenated repeated lines:

 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter <--
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter <--
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml
0

3 Answers 3

11

Uniq should be enough:

$ cat c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml

$ uniq -D c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter

$ uniq c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml

By default uniq checks adjacent lines of the input file. So for an unsorted file (like your case) uniq will do the job you want.

you might also be interested in uniq -d and -u option. See man page for more details (-d prints only one of the both duplicate lines , -u print only uniq lines - removes both duplicate entries).

0
6

Another option:

grep -zPo '\n(.+)\n\1\n'

This way we may add extra tuning (example accept extra spaces, etc)

Upgrade: as @thor pointed out this is not capturing repetitions at the begining of the file. To cover this situation use

grep -zPo '(?<!.)(.+\n)\1' 
2
  • 2
    Interesting solution, but it does not work when the duplicate lines are the at the beginning of the file. Commented Jan 24, 2017 at 1:23
  • 2
    @Thor, you are write, thank you. We can write grep -zPo '(?<!.)(.+\n)\1' but is getting a bit cryptic... Commented Jan 24, 2017 at 9:31
1

Yet another option with AWK:

awk 'x !~ $0; {x=$0}'

This way you can get the same behavior as uniq, but can also do per column.

awk -F/ 'x !~ $2; {x=$2}'

-F sets the field separator.

This way you remove lines where the second field is consecutively equal to the previous line's second field.

$ cat c.txt
 line/one
 line/two
 otherline/two
 yetanotherline/two
 line/three

$ awk -F/ 'x !~ $2; {x=$2}' c.txt
 line/one
 line/two
 line/three

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.