I have a grep command
grep -Fvf cleaned1 cleanedR > cleaned2
that runs and kills my PC too much ram usage
- cleanedR is a list of files (14 million of them) that I need to run some operation thru (dowork.sh cleanedR), everything that has been completed is printed into cleaned1 (in a different sort order, so diff wont work)
- cleaned1 is a list of files (10 million)
- I had to cancel the dowork.sh operation, to do something else, but I can resume it later thru another list (dowork.sh cleaned2). cleaned2 doesnt exist yet
- cleaned2 will be a list of 4 million files which I have yet to run dowork.sh thru.
- Essentially I need to do this mathematical operation (its a subtract operation): list of files cleanedR - list of files cleaned1 = list of files cleaned2
cleaned1 and cleanedR are files containing absolute file structure, with millions of files, these are big files. cleaned1 is 1.3G and cleanedR is 1.5G.
I have like 30 G of ram available but it used up all of that and crashed
I was thinking why does grep use ram on this, can I make ram use some other temp directory. Sort has that option with -T. So I was looking for a similar way for grep.
I am open to other ideas.
-f runs thru cleaned1 which contains millions of expressions (file names) instead of string regular expression. 1 file per line -F does full match on the line. filenames can be complex and grep can mistake some chars for regular expression chars, we dont want that so we do a full line match. -v is the subtract / exclude operation
grepsimply take hours or days to process (processing a 1.3 GB file 14 million times -- what do you expect!)?