I’m writing something that deals with file matches, and I need an inversion operation. I have a list of files (e.g. from find . -type f -print0 | sort -z >lst), and a list of matches (e.g. from grep -z foo lst >matches – note that this is only an example; matches can be any arbitrary subset (including empty or full) or lst), and now I want to invert this list.
Background: I’m sorta implementing something like find(1) excepton file lists (although the files do exist in the filesystem at the point of calling, the list may have been pre-filtered). If the list of files weren’t potentially so large, I could use find "${files[@]}" -maxdepth 0 -somecondition -print0, but even moderate use of what I’m writing would go beyond the Linux or BSD argv size limit.
If the lines were not NUL-separated, I could use comm -23 lst matches >inverted. If the matches were not NUL-separated, I could use grep -Fvxzf matches lst. But, from the generators I mentioned in the first paragraph, both are.
Assume GNU tools are installed, so this needs not be portable beyond e.g. Debian, as I’m using find -print0, sort -z and friends already (although some BSDs have it, so if it can be done in “more portable”, I won’t complain).
I’m trying to do code reuse here; plus, comm -23 is basically the perfect tool for this already except it doesn’t support changing the input line separator (yet), and comm is an underrated and not-enough-well-known tool anyway. If the Unix/Linux toolbox doesn’t offer anything sensible, I’m likely to reimplement a form of comm -23 (reduced to just this one use case) in shell, as the script already (for other reasons) requires a shell that happens to support read -d '' for NUL-delimited input, but that’s going to be slow (and effort… I posted this at the end of the workday in the hopes someone has got an idea for when I pick this up tomorrow or on the 28th).
os.walk(). But the same principle applies.setwill.commwith 2 inverts:comm -23 <(tr '\n\0' '\0\n' <lst) <(tr '\n\0' '\0\n' <matches) | tr '\n\0' '\0\n'