I have a directory with 1000+ files. In a text file, I have about 50 filenames, one per line. I'd like to delete all the files in the directory whose filenames don't correspond with an entry on the list. What's the best way to do this? I started a shell script, but couldn't determine the proper command to determine in the filename is on the list. Thanks.
8 Answers
I realize that any question asking how to delete files must be taken with great care. My first answer was too hasty I didn't take the fact that the filelist could be malformed to be used with egrep. I edited the answer to reduce that risk.
That should work for the files that has no space in the name:
First rebuild your filelist to be sure to match the exact file name:
sed -e 's,^,^,' -e 's,$,$,'  filelist  > newfilelist 
build the rm commands
cd your_directory
ls | egrep -vf newfilelist   | xargs -n 1 echo rm  >  rmscript
Check if the rm script suits you  (You can do it with "vim" or "less").
Then perform the action :
sh -x rmscript
If the files have spaces in their name (if the files have the " in the name then this will not work) :
ls | egrep -vf newfilelist  | sed 's,^\(.*\)$,rm "\1",' > rmscript
of course the filelist should not be in the same directory!
EDITED :
The Nathan's file list contained names that were matching all the files in the directory  (like "html" matches "bob.html").  So nothing was deleted because egrep -vf absorbed all the stream. I added a command to put a "^" and a "$" around each file name. I was lucky here that Nathan's file list was correct. Would have it been DOS formatted with CR-LF ended lines or with additional spaces, no files would have been preserved by the egrep and all been deleted. 
- 
        When I run the preview command, I get one line with "rm". When I run the actual command, I get an error message about missing arguments for rm. Do I need special syntax to use the results from ls | egrep in the xargs input?Nathan– Nathan2014-04-30 14:16:58 +00:00Commented Apr 30, 2014 at 14:16
 - 
        @Nathan you must cd to your directory first. No special syntaxt.
lsprovides the directory file names,egrep -vf filelistfilter your 50 file names. I'm afraid you deleted all your files.Emmanuel– Emmanuel2014-04-30 14:27:31 +00:00Commented Apr 30, 2014 at 14:27 - 
        @Emamanuel I'm running the command from the directory that contains files to be deleted.Nathan– Nathan2014-04-30 14:31:03 +00:00Commented Apr 30, 2014 at 14:31
 - 
        @Nathan are all your files deleted ?Emmanuel– Emmanuel2014-04-30 14:31:47 +00:00Commented Apr 30, 2014 at 14:31
 - 
        
 
Pre-construct the arguments to find:
{
  read -r
  keep=( -name "$REPLY" ) # no `-o` before the first one.
  while read -r; do
    keep+=( -o -name "$REPLY" )
  done
} < file_list.txt
find . -type f ! \( "${keep[@]}" \) -exec echo rm {} +
Use the echo parts to see what would be constructed. Remove the echo parts to actually run it.
Update: Demonstration:
##
# Demonstrate what files exist for testing.
# Show their whitespace:
~/foo $ printf '"%s"\n' *
" op"
" qr"
"abc"
"def"
"gh "
"ij "
"k l"
"keep"
"m n"
##
# Show the contents of the "keep" file,
# Including its whitespace:
~/foo $ cat -e keep
keep$
abc$
gh $
k l$
 op$
##
# Execute the script:
~/foo $ { read -r; keep=( -name "$REPLY" ); while read -r ; do keep+=( -o -name "$REPLY" ); done } < keep
~/foo $ find . -type f ! \( "${keep[@]}" \) -exec rm {} +
##
# Show what files remain:
~/foo $ printf '"%s"\n' *
" op"
"abc"
"gh "
"k l"
"keep"
    - 
        i like this one best as it removes the need fore filelisteyoung100– eyoung1002014-04-30 14:56:31 +00:00Commented Apr 30, 2014 at 14:56
 - 
        +1 from me, although it doesn't deal very well with spaces. Perhaps some single quotes (
') should be added i.e.keep=( -name \'"$REPLY"\' )andkeep+=( -o -name \'"$REPLY"\' ).Cristian Ciupitu– Cristian Ciupitu2014-08-26 00:36:25 +00:00Commented Aug 26, 2014 at 0:36 - 
        the above is dangerous, because you can delete accidentally files.davidva– davidva2014-08-26 01:28:56 +00:00Commented Aug 26, 2014 at 1:28
 - 
        @CristianCiupitu doesn't it? I added a demo showing that it deals very well with whitespace.kojiro– kojiro2014-08-26 13:13:50 +00:00Commented Aug 26, 2014 at 13:13
 - 
        @davidva Under what circumstances? Any time you automate deleting things you run the risk of making a mistake, but within the parameters of the question I think my demo proves this approach is sound.kojiro– kojiro2014-08-26 13:15:21 +00:00Commented Aug 26, 2014 at 13:15
 
With zsh:
mylist=(${(f)"$(<filelist)"})
print -rl -- *(.^e_'(($mylist[(Ie)$REPLY]))'_)
It reads the lines of filelist in an array and then uses glob qualifiers/estring to glob/select only the file names not present in the array: the . selects only regular files (add D if your list contains dotfiles) and the negated ^e_'expression'_ further selects only those for which the expression returns false, i.e. if their name ($REPLY) is not an element of the array.
If you're happy with the result replace print -rl with rm to actually remove the files:
rm -- *(.^e_'(($mylist[(Ie)$REPLY]))'_)
To select & remove files recursively, use the */** glob with ${REPLY:t} glob modifier:
rm -- */**(.^e_'(($mylist[(Ie)${REPLY:t}]))'_)
    If you put the contents of the directory into a file like so:
cd <somedirectory>
ls >> filelist
Open filelist with a text editor, and remove all the files except the ones YOU WANT TO DELETE. That's bolded because it's the opposite approach to the answer above
Try this:
while read p || [[ -n $p ]]; 
echo $p
done < filelist
If you see your list of files output to the screen replace echo with rm -v, like so:
while read p || [[ -n $p ]]; 
rm -v $p
done < filelist
    Run the below script.
- Initially I am finding all the files that are present inside the
directory and storing the output to another file 
all_files. - We have a file which has the list of files that should NOT be
deleted (
not_to_be_deleted_files). - I am adding the file names 
not_to_be_deleted_filesandfiles_to_be_deletedto the end ofnot_to_be_deleted_filesas we need these 2 files. - Now, I am finding the files that needs to be deleted using linux
joincommand and redirecting the output tofiles_to_be_deletedfile. - Now, in the final while loop I am reading all the file names in
files_to_be_deletedand removing the files mentioned in that file name. 
The script is as below.
find /home/username/directory -type f | sed 's/.*\///' > all_files
echo all_files >> not_to_be_deleted_files
echo not_to_be_deleted_files >> not_to_be_deleted_files
echo files_to_be_deleted >> not_to_be_deleted_files
join -v 1 <(sort all_files_listed) <(sort files_not_to_be_deleted) >   files_to_be_deleted
while read file
rm  "$file"
done < files_to_be_deleted
P.S: Probably, if you wish this to be saved as a script and run it, you can add the script name also using echo scriptname >> not_to_be_deleted_files.
Though it is not required, I prefer to do it because there will be no regrets later. I tested for a small set of files and it worked in my system. However, if you want to be sure, try in a test directory first and then remove the files in the original directory. 
- Use the list as a source, to move all files in the list to a fresh, new and empty save-dir.
 - Compare the number of files in the list and the number of saved files.
 - If both match, delete all unsaved files with your favorite method.
 - Move the saved files back.
 
I went for a safer and much, much faster approach because I had 18.000 files in the list! I needed to clean up images in a large Drupal installation.
Deleting all the files that are not in the list is the same as keeping only those that are in the list. So I decided to actually copy the files from the list to another location, but copying 20 GB of files would take up too much space and be very slow as well. So the trick is to copy the files as hardlinks instead, using the -l option of cp. This takes up almost no space and is very fast. Additionally, since I needed to preserve the directory structure, I used the --parents option.
Here is an excerpt from my file list:
1px.png
misc/feed.png
modules/file/icons/x-office-presentation.png
modules/file/icons/x-office-spreadsheet.png
newsletter.png
sites/all/libraries/ckeditor/plugins/smiley/images/devil_smile.png
sites/all/libraries/ckeditor/plugins/smiley/images/regular_smile.png
sites/default/files/009313_PwC_banner_CBS_Observer_180x246px.jpg
So an example line would be, with temp being the destination:
cp -l --parents 'misc/feed.png' temp
This will create this structure:
temp
  misc
    feed.png
Note that the destinaton must be in the same file system as the source for hardlinks to work.
The next step is to construct the script:
sed -e "s,^,cp -l --parents '," -e "s,$,' /some/where/temp," filelist > newfilelist
Now, presuming you already created the empty dir /some/where/temp, you can copy the files like this:
sh newfilelist 2> missing_files
Note how errors end up in missing_files. The added bonus of this approach is that you will get a list of files from the original list that actually don't exist!
After running the script, temp will contain only those files that are in the file list, but without deleting anything and without taking up additional space. If you are satisfied with the result, you can delete all the orginal files including the subfolders.
Finally, move the files and folders from temp back to the original location.
For the 18.000 files it took only a few seconds.
Safe, simple.
cd to directory.
Create a temp directory.
mv *.yourExlusionSelector.* ./temp
rm *
mv ./temp ./
rm -rf ./temp
done.
- 
        Welcome to the site. While your approach will work if the names on the list mentioned by the OP are the result of a simple pattern matching - which may very well be the case - please note that the OP stated that the filenames to exclude are stored in a specific file; you may want to expand your answer so as to read the exclusion patterns from that file instead of relying on one static pattern, or having to type-copy potentially multiple patterns to the console.AdminBee– AdminBee2020-02-28 11:39:02 +00:00Commented Feb 28, 2020 at 11:39