I have 250 strings and I need to count the number of times each one appears on every line of my 400 files (which are up to 20,000 lines). Example of strings:
journal
moon pig
owls
Example of one file:
This text has journal and moon pig
This text has owls and owls
Example output:
1 0
1 0
0 2
EDIT: where column one counts strings from the first line of the file, and column two represents the second line of the file.
I have working code but its obviously very slow. I'm sure awk could speed it up but I'm not good enough to write it.
for file in folder/*
do
name=$(basename "$file" .txt)
linenum=1
while read line
do
while read searches
do
###count every time string appears on line and save
count=$(echo $line | grep -oi "$searches" | wc -l)
echo $count >> out/${name}_${linenum}.txt
done < strings.txt
linenum=$((linenum+1))
done < $file
done
EDIT: I do 400 pastes like this, where x is the number of lines in the original file.
paste out/file1_{1..x}.txt > out/file1_all.txt
Does anyone know how to speed this up?
example of my filesandexample outputit's unclear to me what the columns in the output are meant to correlate to. Are the two sample lines meant to come from two files, rather than one?