Find two consecutive repeated lines

Question

How can I find two concatenated repeated lines in files?

For example, in this file we have only two concatenated repeated lines:

 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter <--
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter <--
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml

George Vasiliou · Accepted Answer · 2017-01-24 00:03:41Z

Uniq should be enough:

$ cat c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml

$ uniq -D c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter

$ uniq c.txt
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.ear
 OQ-63/ECC/Global/MES/CZ/adWerum-CZ-Adapter
 OQ-63/ECC/Global/MES/54/ECC-MRP-S05.xml

By default uniq checks adjacent lines of the input file. So for an unsorted file (like your case) uniq will do the job you want.

you might also be interested in uniq -d and -u option. See man page for more details (-d prints only one of the both duplicate lines , -u print only uniq lines - removes both duplicate entries).

JJoao · Accepted Answer · 2017-01-24 09:39:18Z

6

Another option:

grep -zPo '\n(.+)\n\1\n'

This way we may add extra tuning (example accept extra spaces, etc)

Upgrade: as @thor pointed out this is not capturing repetitions at the begining of the file. To cover this situation use

grep -zPo '(?<!.)(.+\n)\1'

edited Jan 24, 2017 at 9:39

answered Jan 24, 2017 at 0:02

JJoao

12.8k1 gold badge26 silver badges45 bronze badges

2

Interesting solution, but it does not work when the duplicate lines are the at the beginning of the file.

Thor
– Thor

2017-01-24 01:23:48 +00:00
Commented Jan 24, 2017 at 1:23
2

@Thor, you are write, thank you. We can write grep -zPo '(?<!.)(.+\n)\1' but is getting a bit cryptic...

JJoao
– JJoao

2017-01-24 09:31:26 +00:00
Commented Jan 24, 2017 at 9:31

Add a comment |

Marcos Oliveira · Accepted Answer · 2017-05-23 17:50:58Z

1

Yet another option with AWK:

awk 'x !~ $0; {x=$0}'

This way you can get the same behavior as uniq, but can also do per column.

awk -F/ 'x !~ $2; {x=$2}'

-F sets the field separator.

This way you remove lines where the second field is consecutively equal to the previous line's second field.

$ cat c.txt
 line/one
 line/two
 otherline/two
 yetanotherline/two
 line/three

$ awk -F/ 'x !~ $2; {x=$2}' c.txt
 line/one
 line/two
 line/three

answered May 23, 2017 at 17:50

Marcos Oliveira

3041 silver badge3 bronze badges

Add a comment |

Stack Exchange Network

Find two consecutive repeated lines

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Find two consecutive repeated lines

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions