0

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.

I've settled on something like:

#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
  echo Searching for $j...
  unset match
  match=`grep -i -m1 -o "$j" $file`
  if [ $match ]; then
    echo "Match: $match"
 fi
done

Blacklist.txt contains lines of potential matches, like so:

matchthis
"match this too"
thisisasingleword
"This is multiple words"

myfile.txt could be something like:

I would matchthis if I could match things with grep.  I really wish I could. 
When I ask it to match this too, it fails to matchthis.  It should match this too - right?

If I run this at a bash prompt, like so:

j="match this too"
grep -i -m1 -o "$j" myfile.txt

...I get "match this too".

However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.

Where am I going wrong?

4 Answers 4

2

Wouldn't

grep -owF -f blacklist.txt myfile.txt 

instead of writing an inefficient loop, do what you want?

Sign up to request clarification or add additional context in comments.

3 Comments

No; that returned a whole lot of stuff that didn't match.
Right. This also matches substrings inside larger strings. I have modified my solution. However, my solution would still match the word bar, if your input file contains foo;bar;baz. Is this what you want, or should there be no match in this case?
This still seems like a superior solution conceptually. If something doesn't work exactly like you expect, probably experiment with the grep flags, or clarify what exactly is wrong, perhaps in a new question specifically about this problem with a proper minimal reproducible example
0

Would you please try:

#!/bin/bash

file="myfile.txt"

while IFS= read -r j; do
    j=${j#\"}; j=${j%\"}                        # remove surrounding double quotes
    echo "Searching for $j..."

    match=$(grep -i -m1 -o "$j" "$file")
    if (( $? == 0 )); then                      # if match
        echo "Match: $match"                    # then print it
    fi
done < blacklist.txt

Output:

Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...

Comments

0

I wound up abandoning grep entirely and using sed instead.

match=`sed -n "s/.*\($j\).*/\1/p" $file

Works well, and I was able to use unquoted multiple word phrases in the blacklist file.

1 Comment

Reading one search phrase at a time and repeatedly scanning the entire input file is a horrible antipattern. You can probably refactor this to a single sed script which searches for all your search phrases in one go. I have several answers demonstrating how to do this in more detail; maybe search for sed -f in conjunction with my user name.
0

With this:

if [ $match ]; then

you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:

if [ -n "$match" ]; then

You might also use grep's exit code instead:

if [ "$?" -eq 0 ]; then

for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:

IFS='
'
for j in `cat blacklist.txt`; do

blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.

j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.

Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.

Script:

#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
  echo Searching for $j...
  unset match
  match=`grep -i -m1 -o "$j" "$file"`
  if [ -n "$match" ]; then
    echo "Match: $match"
  fi
done

Alternative to the for ... in ... loop (no IFS= needed):

while read; do
    j="$REPLY"
    ...
done < 'blacklist.txt'

2 Comments

I'd avoid using for ... in to read files; even with a modified IFS, it's still not respecting quotes or escapes, and it'll try to expand anything that looks like a filename wildcard (which can have completely silly results). Also, if you change IFS, you should always set it back afterward to avoid weird problems later.
@GordonDavisson I tried to only fix the original code, not to modify it. Still this IFS=...;for ... in ... is a working solution and to my knowledge it is trustworthy. The alternative is to use while read loop with the file contents passed to its input. I will modify the answer to include that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.