Inian’s answer works perfectly when file2 is only one line long,
and is a good start on a more general answer.
But I believe that
awk 'FNR == NR { neg[$1]; next } { ok=1; for (i in neg) if ($2 ~ i) ok=0; if (ok) print }' file2 FS="," file1
will do what you want in general.
Like your answer, it starts by reading file2 and storing its contents
(the patterns that you want to remove from file) in an array.
Like Inian’s answer, it then reads file1.
For each line in file1, it loops through the patterns from file2.
We assume that the line is OK; if it matches any pattern, then it’s not OK.
If it is still OK after checking all the patterns, we print it.
But I put FS="," as an argument between file2 and file1
just because that’s the way Inian did it.
It doesn’t matter what field separator we use when we read file2,
as long as it doesn’t appear therein — and file2 contains no commas.
So we could simplify the above a little
by specifying the field separator the ‘normal’ way —
with a -F option at the beginning of the command:
awk -F, 'FNR == NR { neg[$1]; next } { ok=1; for (i in neg) if ($2 ~ i) ok=0; if (ok) print }' file2 file1
You can use -F"," if you prefer; they’re equivalent.
The test FNR == NR is so popular and pervasive
that we use it without thinking.
FNR is the line number (a.k.a. record number) within the current file,
and NR is the line number across all input.
So, for example,
$ cat cats
Felix
Garfield
Heathcliff
$ cat dogs
Lassie
Marmaduke
Snoopy
$ awk '{ print FNR, NR, $0 }' cats dogs
1 1 Felix
2 2 Garfield
3 3 Heathcliff
1 4 Lassie
2 5 Marmaduke
3 6 Snoopy
… and so FNR and NR are equal
for each line of the first file to be processed,
and not in subsequent file(s).
And so we use FNR == NR to test whether we are processing the first file.
But this is actually a bad practice.
What if the first file is empty?
$ cat unicorns
$ wc unicorns
0 0 0 unicorns
$ awk '{ print FNR, NR, $0 }' unicorns dogs
1 1 Lassie
2 2 Marmaduke
3 3 Snoopy
FNR == NR is true for the first file that actually has data.
If your file2 will never ever ever be empty,
you may be able to get away with ignoring this issue.
But, based on the definition of your problem,
if file2 is empty, the output should be all of file1,
because we aren’t removing anything.
But, if you run the above command with an empty file2,
you will get no output,
because awk thinks it’s reading the first file (file2)
when it’s actually reading the second file (file1).
A safer way to do this is to put an assignment between the file arguments:
awk -F, 'FILE != 2 { neg[$1]; next } { ok=1; for (i in neg) if ($2 ~ i) ok=0; if (ok) print }' file2 FILE=2 file1
The question is a little ambiguous.
What does “partial match” mean, exactly?
Inian chose to interpret it in the sense that the question suggests
— like grep.
If any value from file2
matches the value from the second column of file1
as a regular expression,
then remove that line of file1.
But there are two problems with this.
The surprise factor.
I took the files in the question and added a
154376352,"http://sb288eco.tm","example4"
line to file1, and ran my first command.
That "example4" line was not output,
because sb288.co (from file2), taken as a regular expression
(in which . means “match any character”), matched sb288eco.
If that’s what you want and expect to happen,
you might as well stop reading this now.
- Regular expression processing is computationally expensive.
Regular expressions have to be parsed and processed.
This will likely take more time than simple string comparison.
We can solve both of the above issues
by testing whether the string from file2
is present in the value from file1 with awk’s index function:
awk -F, 'FILE != 2 { neg[$1]; next } { ok=1; for (i in neg) if (index($2,i) > 0) ok=0; if (ok) print }' file2 FILE=2 file1
With the above, a . in file2 matches only a . in file1,
and not any other character.
I invite you to test the above on your data and see whether it is any faster.
P.S. I just noticed that you changed the file format
since I posted my answer.
Originally you wanted to test the values from file2
against values from the second column of file1.
Now you seem to want to test
against values from the first column of file1.
To accommodate this change,
you should take the part of any of the above answers
that compares $2 to i, and change it to use $1 instead.
Or, if you really want to test the entire line from file1, use $0.
So, bottom line, you might want to use
awk -F, 'FILE != 2 { neg[$1]; next } { ok=1; for (i in neg) if (index($1,i) > 0) ok=0; if (ok) print }' file2 FILE=2 file1
as your command.
With line breaks for readability, that’s
awk -F, 'FILE != 2 { neg[$1]; next }
{
ok=1
for (i in neg)
if (index($1,i) > 0) ok=0
if (ok) print
}' \
file2 FILE=2 file1
grepcommand works for me. I see no reason why you have to use the-wand-Fparameters.