Skip to main content
working solution
Source Link
choroba
  • 49.4k
  • 7
  • 92
  • 118

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Wait... Does it preserve the order?

Another update: Some more processing is needed to keep the order. This gives me the same results as your original script. Tested on 300K lines in FILE_A and only 300K lines in FILE_B, 125 minutes vs. 14 secs.

#! /bin/bash
grep -Ff FILE_A FILE_B > FILE_B_TMP
grep -oFf FILE_A FILE_B_TMP > FILE_A_SHUFF
grep -Ff FILE_A_SHUFF FILE_A > FILE_A_TMP

while read -r line; do
   grep -F -m1 "$line" FILE_B_TMP
done < FILE_A_TMP > result.txt

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Wait... Does it preserve the order?

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Wait... Does it preserve the order?

Another update: Some more processing is needed to keep the order. This gives me the same results as your original script. Tested on 300K lines in FILE_A and only 300K lines in FILE_B, 125 minutes vs. 14 secs.

#! /bin/bash
grep -Ff FILE_A FILE_B > FILE_B_TMP
grep -oFf FILE_A FILE_B_TMP > FILE_A_SHUFF
grep -Ff FILE_A_SHUFF FILE_A > FILE_A_TMP

while read -r line; do
   grep -F -m1 "$line" FILE_B_TMP
done < FILE_A_TMP > result.txt
deleted 182 characters in body
Source Link
choroba
  • 49.4k
  • 7
  • 92
  • 118

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Oh wait,Wait... Does it breakspreserve the order. Imagine

AB
AC

in FILE_A and

AC,XY,Z
AB,AC,Z

in FILE_B. This solution returns

AB,AC,Z
AB,AC,Z

while yours returns?

AB,AC,Z
AC,XY,Z

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Oh wait, it breaks the order. Imagine

AB
AC

in FILE_A and

AC,XY,Z
AB,AC,Z

in FILE_B. This solution returns

AB,AC,Z
AB,AC,Z

while yours returns

AB,AC,Z
AC,XY,Z

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Wait... Does it preserve the order?

added 233 characters in body
Source Link
choroba
  • 49.4k
  • 7
  • 92
  • 118

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Oh wait, it breaks the order. Imagine

AB
AC

in FILE_A and

AC,XY,Z
AB,AC,Z

in FILE_B. This solution returns

AB,AC,Z
AB,AC,Z

while yours returns

AB,AC,Z
AC,XY,Z

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Can you run the following?

grep -Ff FILE_A FILE_B > FILE_C

Now you can run your script on files A and C only.

Update: Oh wait, it breaks the order. Imagine

AB
AC

in FILE_A and

AC,XY,Z
AB,AC,Z

in FILE_B. This solution returns

AB,AC,Z
AB,AC,Z

while yours returns

AB,AC,Z
AC,XY,Z
Source Link
choroba
  • 49.4k
  • 7
  • 92
  • 118
Loading