3

the basic code to seach a match for one string

cat fileA | grep -Fwf include.txt

How do can we use a code to match at least two patterns from that include.list

file A 
data1 khc001 khc002 vp005
data1 fbc001 cs004 khc001

include.txt
khc001
khc002

correct output line 1: data1 khc001 khc002 vp005

in the following e.g only 2 patterns are listed, but the list contains much more this is why awk '/pattern1/ && /pattern2/' is not appropriate.

5 Answers 5

4

It would be rather easy to do this with awk, counting the number of fields on each line in fileA that are equal to the strings in the include.txt file:

awk 'NR == FNR { p[$1]; next }
     {
         c = 0
         for (i = 1; i <= NF; ++i) if ($i in p) c++
         if ( c >= 2 ) print
     }' include.txt fileA

This first reads the include.txt file and uses the words as keys in an associative array. It then reads the second file and for each row, it iterates over the fields and tests each one to see whether it matches any of the keys.

For each match, we increment a counter, and if the counter is equal or grater than two at the end, we print the line.


Alternative formulation of the code for people who likes "one-liners":

awk 'NR==FNR {p[$1];next} {c=0;for (i=1;i<=NF;++i) if ($i in p) c++} c>=2' include.txt fileA
5
  • Thank you. Very informative, using a counter is the best approach. Commented May 18, 2018 at 19:56
  • @EnrikS Never say something is the "best" approach! ;-) Commented May 18, 2018 at 19:58
  • Noted :-p So far as a newbie, its very informative to get to know how to use the concept of those who ve been in that field for a long time. Commented May 18, 2018 at 20:07
  • @EnrikS Note, it also will print next line: data1 khc001 khc001 vp005, that is, if the one pattern occurs more than one time. Commented May 19, 2018 at 22:09
  • @Kusalananda The execution speed is just remarkable. Commented May 21, 2018 at 14:55
1

This should work assuming the order of patterns are the same way as in inputfile but not mis-order:

grep -F " $(tr '\n' ' ' <patterns)" infile
5
  • got stuck nearly 3 hours with a similar code earlier. Am i doing something wrong, the output is not correct, gives me all lines containing at least 1 pattern. Commented May 18, 2018 at 20:29
  • i nailed it at last. So basically as you said, the each line need to be sorted, else it aint gonna work. Commented May 18, 2018 at 20:52
  • Yes, means if include.txt contains x\ny\nz then that will match a line containing x y z and not x z y or others. if your fileA is not same order of your include.txt then I would delete my answer as it doesn't resolve your question. Commented May 18, 2018 at 20:54
  • 1
    It worked just fine. My file is sorted, actually was running a test on a b c 1 2 3 . Thank your time n help. Appreciate it. Commented May 18, 2018 at 21:05
  • 1
    just a note again that not sort, I mentioned order means include.txt containing v\nd\nz\na (\n is the actual newline) will only match lines in fileA with v d z a Commented May 18, 2018 at 21:10
0

I was able to accomplish this with the following grepception:

grep -Fwf <(grep -v $(grep -oFwf include.txt fileA | head -1) include.txt) fileA

This will remove one of the matching patterns from include.txt and ensure there is at least one other match.

1
  • grep is very useful but because of false positive results, i prefer awk when it comes to that kind of task. But love the idea behind, will obviously tweak that code and use it on something else thank you. Commented May 18, 2018 at 20:02
0

Another awk

awk '
  NR==FNR {
    a[NR]=$0
    next }
  !b { b=NR }
  {
    c=$0
    for(i=1;i<b;i++)
        if(!sub("\\<"a[i]"\\>","",c))
            next
  }1
' include.txt file\ A

Try to remove each word from include.txt in each line.

If a word is not remove not print the line.

3
  • could you please explain the last argument for c=$0 Commented May 18, 2018 at 21:19
  • Sorry I don't understand what you want, so keep the entire line in c. In the for loop remove each word from include.txt in c. At the end, c is not the entire line but $0 is. If it's ok print $0 with the 1 (default action = print). Commented May 18, 2018 at 21:29
  • Thank you, learn something : 1 is used to evaluate, therefore }1 stands for a default operation to print $0, which stands for the current line. Commented May 20, 2018 at 10:14
0
grep -Fwonf include.txt file_A | 
uniq | 
cut -d: -f1 | 
printf '%dp\n' $(uniq -d) | 
ed -s file_A 

Testing

The content of files (file_A more complicated for testing):

$ tail -n +1 -- file_A include.txt 
==> file_A <==
data1 khc001 khc002 vp005
data1 fbc001 cs004 khc001
data1 khc001 khc001 vp005
data1 khc002 khc001 vp005

==> include.txt <==
khc001
khc002

Output

data1 khc001 khc002 vp005
data1 khc002 khc001 vp005
2
  • In regards to grep, must say that this code is quite remarkable with the speed. Could please explain how -Fwonf works. Nothing came up on Google search. Commented May 20, 2018 at 10:26
  • @EnrikS You are needing look up into the man grep for information about the -Fwonf options. Commented May 22, 2018 at 11:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.