Skip to main content
added 206 characters in body
Source Link

I have 250 strings and I need to count the number of times each one appears on every line of my 400 files (which are up to 20,000 lines). Example of strings:

journal
moon pig
owls

Example of my filesone file:

This text has journal and moon pig
This text has owls and owls

Example output:

1   0
1   0
0   2

EDIT: where column one counts strings from the first line of the file, and column two represents the second line of the file.

I have working code but its obviously very slow. I'm sure awk could speed it up but I'm not good enough to write it.

for file in folder/*
do
    name=$(basename "$file" .txt)
    linenum=1
    while read line
    do
        while read searches
        do
            ###count every time string appears on line and save
            count=$(echo $line | grep -oi "$searches" | wc -l)
            echo $count >> out/${name}_${linenum}.txt
        done < strings.txt
        linenum=$((linenum+1))
    done < $file
done
 

EDIT: I do 400 pastes like this, where x is the number of lines in the original file.

paste out/example_file1_{1..500x}.txt > out/example_allfile1_all.txt

Does anyone know how to speed this up?

I have 250 strings and I need to count the number of times each one appears on every line of my 400 files (which are up to 20,000 lines). Example of strings:

journal
moon pig
owls

Example of my files:

This text has journal and moon pig
This text has owls and owls

Example output:

1   0
1   0
0   2

I have working code but its obviously very slow. I'm sure awk could speed it up but I'm not good enough to write it.

for file in folder/*
do
    name=$(basename "$file" .txt)
    linenum=1
    while read line
    do
        while read searches
        do
            ###count every time string appears on line and save
            count=$(echo $line | grep -oi "$searches" | wc -l)
            echo $count >> out/${name}_${linenum}.txt
        done < strings.txt
        linenum=$((linenum+1))
    done < $file
done
 
paste out/example_{1..500}.txt > out/example_all.txt

Does anyone know how to speed this up?

I have 250 strings and I need to count the number of times each one appears on every line of my 400 files (which are up to 20,000 lines). Example of strings:

journal
moon pig
owls

Example of one file:

This text has journal and moon pig
This text has owls and owls

Example output:

1   0
1   0
0   2

EDIT: where column one counts strings from the first line of the file, and column two represents the second line of the file.

I have working code but its obviously very slow. I'm sure awk could speed it up but I'm not good enough to write it.

for file in folder/*
do
    name=$(basename "$file" .txt)
    linenum=1
    while read line
    do
        while read searches
        do
            ###count every time string appears on line and save
            count=$(echo $line | grep -oi "$searches" | wc -l)
            echo $count >> out/${name}_${linenum}.txt
        done < strings.txt
        linenum=$((linenum+1))
    done < $file
done

EDIT: I do 400 pastes like this, where x is the number of lines in the original file.

paste out/file1_{1..x}.txt > out/file1_all.txt

Does anyone know how to speed this up?

Source Link

Counting a list of strings on every line of multiple files

I have 250 strings and I need to count the number of times each one appears on every line of my 400 files (which are up to 20,000 lines). Example of strings:

journal
moon pig
owls

Example of my files:

This text has journal and moon pig
This text has owls and owls

Example output:

1   0
1   0
0   2

I have working code but its obviously very slow. I'm sure awk could speed it up but I'm not good enough to write it.

for file in folder/*
do
    name=$(basename "$file" .txt)
    linenum=1
    while read line
    do
        while read searches
        do
            ###count every time string appears on line and save
            count=$(echo $line | grep -oi "$searches" | wc -l)
            echo $count >> out/${name}_${linenum}.txt
        done < strings.txt
        linenum=$((linenum+1))
    done < $file
done

paste out/example_{1..500}.txt > out/example_all.txt

Does anyone know how to speed this up?