grepping multiple strings

Question

I am using grep command to take the required information from a file . I am using two grep statements like the below

XXXX='grep XXXX FILE A|sort|uniq|wc -l'
grep YYYY FILE A|uniq| > FILE B

Now the file is being traversed twice . But I just want to know, if I will be able to do these two steps in a single file traversal i.e I want to know if I could use something similar to egrep where I can grep for two strings and one string I will use it for stroring in a variable and output of another string into a file.

mostar · Accepted Answer · 2012-07-10 22:17:31Z

1

You can use the following code. Here we search for lines containing XXXX or YYYY in all file for only once and store the resulting lines to an array. Then we use elements of this array to select the lines containing XXXX and the lines containing YYYY.

filtered=`grep -E '(XXXX|YYYY)' FILE A`
XXXX=`for line in ${filtered[@]}; do echo $line; done | grep XXXX | sort | uniq | wc -l`
for line in ${filtered[@]}; do echo $line; done | grep YYYY | uniq > FILE B

So the file is not traversed twice!

answered Jul 10, 2012 at 22:17

mostar

4,8612 gold badges30 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mavam Over a year ago

This method will quickly blow up if the input size becomes larger than the available memory and only makes sense for small data batches.

mostar Over a year ago

If the purpose is to store data in a variable (that is the case in this question) large input can always fill up memory.

mavam · Accepted Answer · 2012-07-10 21:25:02Z

0

Or use egrep with a disjunction:

egrep '(XXXX|YYYY)' FILE A | sort | uniq | ...

Or awk:

awk '/XXXX|YYYY/' FILE A | sort | uniq | ...

answered Jul 10, 2012 at 21:25

mavam

12.6k10 gold badges57 silver badges88 bronze badges

4 Comments

User Over a year ago

Thank you for your answer..I understand your point ... But how can I store the result of 2 grep statements in two variables

mavam Over a year ago

How big is your input data? This makes only sense for small data volumes. Have a look at associative arrays in awk.

User Over a year ago

The input data is in range of 200 MB .. Its a large file

mavam Over a year ago

Most machines nowadays have more than 200 MB of RAM, so you may be fine. If the input data outgrows your available memory, you need to resort to the pipes-and-filters processing as above.

William Pursell · Accepted Answer · 2012-07-11 13:39:42Z

There is a trailing '|' symbol in your question, and perhaps you intended the YYYY lines to also be piped to sort (or use sort -u!), in which case you could simply do:

awk '/XXXX/ { if( !x[$0]++ ) xcount += 1 } 
     /YYYY/ { if( !y[$0]++ ) ycount += 1 }
  END { print "XXXX:", xcount
        print "YYYY:", ycount
        for( i in y ) print i | "sort > FILEB"
  }' FILE

this scans the file once, incrementing the counter whenever a uniq line containing the appropriate pattern is seen. Note that the order of the iteration over the array of YYYY lines is not well defined here, so the sort is necessary. Some versions of awk provide the ability to sort the array without relying on the external utility, but not all do. Use perl if you want to do that.

Collectives™ on Stack Overflow

grepping multiple strings

3 Answers 3

2 Comments

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Related