I had an operation that works thru originallist (dowork.sh originallist), and lets you know what it has finished into cleaned1. cleaned1 is sorted differently than originallist. I need to generate a list of what's left for dowork.sh to process. Essentially: list cleanedR - list cleaned1 = list cleaned2. It's a minus operation. I found that I can do that operation with the following grep options:
- F for full line match instead of regular expression (we don't want grep freaking out at filename characters thinking they are regular expressions),
- v for exclude (which is the minus operation),
- f for look thru the file cleaned1 for the expressions instead of a single given expression ("obtain PATTERN from FILE").
# wc -l cleaned*
9157094 cleaned1
14283591 cleanedR
# du -sh cleaned*
1.3G cleaned1
2.0G cleanedR
# grep -Fvf cleaned1 originallist > cleaned2
runs for 5 minutes, uses up 42G of ram or less but a lot of it, then exits with failure; cleaned2 is 0 bytes long.
cleaned2 at the end should be 14283591 - 9157094 = 5126497 lines long
This is the correct syntax for doing such an operation (I tested it with a 10 line long cleanedR and a 3 line long cleaned1; the resultant cleaned2 was 7 lines), however it uses up a lot of ram. Is there a way to make this work by making grep not use up so much ram? I know it will take a while, but I am okay with it.
I am looking for something like sort's -T option, which allows you to not use up /tmp (ram in my case), and allow you to use another directory
sort -h
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp;
multiple options specify multiple directories
comm.-xoption help?