1

How can I print lines which have duplicate (x2) values ?

E.g

01 02 03
01 01 03
01 01 01 03

out of these three lines, only line two is correct.

so now let's say I want to look up for line where the value occurred x3.

in this case line 3 is correct.

3
  • When you’re looking for duplicates, does it matter which value is duplicated? For example, would 01 03 03 be valid? What about 01 01 03 03? Commented May 12, 2018 at 17:57
  • thank you for your response. No it does not matter which value is selected. And yes 01 03 03 is valid, and even 01 01 03 03. Commented May 12, 2018 at 18:01
  • @αғsнιη exactly with duplicate, it means x2. for 01 01 01 03, 01 occurs x3. The first answer is by way the easiest approach. And the 3rd answer is a great way to print more info about the results. Haven't tried the 2nd method yet. Thanks a lot for all the help guys. Commented May 13, 2018 at 11:58

3 Answers 3

1

With awk

awk -v nb=3 '{for(i=1;i<=NF;i++)if(++a[$i]>nb){print;next}}' infile

for(i=1;i<=NF;i++) on each field of the line
++a[$i] get each field in a associative array a and incremente it each time we saw a field with the same value.
if(++a[$i]>nb) if the value is more than nb
{print;next} print the line and jump to the next line


To show only the line with nb

awk -v nb=3 '
{
    max = 0
    delete a
    for ( i=1 ; i<=NF ; i++ )
        ++a[$i]
    for( j in a )
        max = a[j]>max ? a[j] : max
    if ( max == nb )
        print
}' infile
8
  • Been playing around with the coeff. Thank you it does work. Very clear explanation. Commented May 12, 2018 at 18:15
  • when i use nb=1, i get all the double even the triple ones, because of the (++a[$i]>nb), which implies that all doubles , triples or anythign higher is correct. But is there a way to limit it to only nb=1. Been trying to modify (++a[$i]>nb), but its not working. Commented May 12, 2018 at 18:30
  • Update the answer to limit Commented May 12, 2018 at 20:04
  • 1
    See also split("", a) for a standard/portable equivalent of delete a. Commented May 13, 2018 at 10:48
  • 1
    @StéphaneChazelas thanks for tip, only man mawk give mawk supports an extension, delete array. Another way is to define a function getmax() with the array local. Commented May 13, 2018 at 14:16
0

With AWK:

awk -v t=2 '{for (i=1; i<=NF; i++) c[$i]++; for (v in c) if (c[v] == t) {print; next}}'

This processes each line, and within each line, counts the occurrences of each value (each field) in the associative array c; then it goes over all the values v it has seen, and if one of the values was seen the required number of times (as specified by the target, t), it prints the line, and skips to the next line to avoid printing the line multiple times (e.g. for 01 01 03 03).

1
  • You'd need to empty the array (like with split("", c) or delete c with some implementations) between each record. Commented May 13, 2018 at 10:48
0

This will only print lines with duplicate space-delimited words:

while IFS='' read -r line ; do  if [[ "`echo $line | tr ' ' '\n'| sort | uniq -d`" != '' ]]; then echo "$line"; fi; done < YOURFILE

For your example the output will be:

01 01 03

01 01 01 03

Here, for the more than one occurrence of "01" in lines 2 and 3...

If you want to specify number of times a word should be checked to be repeated then :

NO=3; lnr=1 ; while IFS='' read -r line ; do echo "for line" $lnr ; echo $line | tr ' ' '\n' | uniq -c| grep -e "^\s*$NO" ; ((lnr++)); done < YOURFILE

For your example output will be:

for line 1

for line 2

for line 3

 3 01

The first number is the number of times you specified in variable NO for the number of occurrences to check.
The second number is the actual word that was found out to be repeating NO times.
Change YOURFILE to your file, of course.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.