Delete all files in a directory whose name do not match a line in a file list

Question

I have a directory with 1000+ files. In a text file, I have about 50 filenames, one per line. I'd like to delete all the files in the directory whose filenames don't correspond with an entry on the list. What's the best way to do this? I started a shell script, but couldn't determine the proper command to determine in the filename is on the list. Thanks.

Jeff Schaller · Accepted Answer · 2018-03-08 12:04:55Z

8

I realize that any question asking how to delete files must be taken with great care. My first answer was too hasty I didn't take the fact that the filelist could be malformed to be used with egrep. I edited the answer to reduce that risk.

That should work for the files that has no space in the name:

First rebuild your filelist to be sure to match the exact file name:

sed -e 's,^,^,' -e 's,$,$,'  filelist  > newfilelist

build the rm commands

cd your_directory
ls | egrep -vf newfilelist   | xargs -n 1 echo rm  >  rmscript

Check if the rm script suits you (You can do it with "vim" or "less").
Then perform the action :

sh -x rmscript

If the files have spaces in their name (if the files have the " in the name then this will not work) :

ls | egrep -vf newfilelist  | sed 's,^\(.*\)$,rm "\1",' > rmscript

of course the filelist should not be in the same directory!

EDITED :

The Nathan's file list contained names that were matching all the files in the directory (like "html" matches "bob.html"). So nothing was deleted because egrep -vf absorbed all the stream. I added a command to put a "^" and a "$" around each file name. I was lucky here that Nathan's file list was correct. Would have it been DOS formatted with CR-LF ended lines or with additional spaces, no files would have been preserved by the egrep and all been deleted.

edited Mar 8, 2018 at 12:04

Jeff Schaller♦

68.8k35 gold badges122 silver badges265 bronze badges

answered Apr 30, 2014 at 13:27

Emmanuel

4,2572 gold badges26 silver badges31 bronze badges

When I run the preview command, I get one line with "rm". When I run the actual command, I get an error message about missing arguments for rm. Do I need special syntax to use the results from ls | egrep in the xargs input?

Nathan
– Nathan

2014-04-30 14:16:58 +00:00
Commented Apr 30, 2014 at 14:16
@Nathan you must cd to your directory first. No special syntaxt. ls provides the directory file names, egrep -vf filelist filter your 50 file names. I'm afraid you deleted all your files.

Emmanuel
– Emmanuel

2014-04-30 14:27:31 +00:00
Commented Apr 30, 2014 at 14:27
@Emamanuel I'm running the command from the directory that contains files to be deleted.

Nathan
– Nathan

2014-04-30 14:31:03 +00:00
Commented Apr 30, 2014 at 14:31
@Nathan are all your files deleted ?

Emmanuel
– Emmanuel

2014-04-30 14:31:47 +00:00
Commented Apr 30, 2014 at 14:31
no, they're still there.

Nathan
– Nathan

2014-04-30 14:32:13 +00:00
Commented Apr 30, 2014 at 14:32

| Show 3 more comments

kojiro · Accepted Answer · 2014-08-26 15:07:50Z

1

Pre-construct the arguments to find:

{
  read -r
  keep=( -name "$REPLY" ) # no `-o` before the first one.
  while read -r; do
    keep+=( -o -name "$REPLY" )
  done
} < file_list.txt
find . -type f ! \( "${keep[@]}" \) -exec echo rm {} +

Use the echo parts to see what would be constructed. Remove the echo parts to actually run it.

Update: Demonstration:

##
# Demonstrate what files exist for testing.
# Show their whitespace:
~/foo $ printf '"%s"\n' *
" op"
" qr"
"abc"
"def"
"gh "
"ij "
"k l"
"keep"
"m n"

##
# Show the contents of the "keep" file,
# Including its whitespace:
~/foo $ cat -e keep
keep$
abc$
gh $
k l$
 op$

##
# Execute the script:
~/foo $ { read -r; keep=( -name "$REPLY" ); while read -r ; do keep+=( -o -name "$REPLY" ); done } < keep
~/foo $ find . -type f ! \( "${keep[@]}" \) -exec rm {} +

##
# Show what files remain:
~/foo $ printf '"%s"\n' *
" op"
"abc"
"gh "
"k l"
"keep"

edited Aug 26, 2014 at 15:07

answered Apr 30, 2014 at 14:55

kojiro

4,7645 gold badges27 silver badges33 bronze badges

i like this one best as it removes the need fore filelist

eyoung100
– eyoung100

2014-04-30 14:56:31 +00:00
Commented Apr 30, 2014 at 14:56
+1 from me, although it doesn't deal very well with spaces. Perhaps some single quotes (') should be added i.e. keep=( -name \'"$REPLY"\' ) and keep+=( -o -name \'"$REPLY"\' ).

Cristian Ciupitu
– Cristian Ciupitu

2014-08-26 00:36:25 +00:00
Commented Aug 26, 2014 at 0:36
the above is dangerous, because you can delete accidentally files.

davidva
– davidva

2014-08-26 01:28:56 +00:00
Commented Aug 26, 2014 at 1:28
@CristianCiupitu doesn't it? I added a demo showing that it deals very well with whitespace.

kojiro
– kojiro

2014-08-26 13:13:50 +00:00
Commented Aug 26, 2014 at 13:13
@davidva Under what circumstances? Any time you automate deleting things you run the risk of making a mistake, but within the parameters of the question I think my demo proves this approach is sound.

kojiro
– kojiro

2014-08-26 13:15:21 +00:00
Commented Aug 26, 2014 at 13:15

| Show 2 more comments

2 revs · Accepted Answer · 2016-01-28 02:27:19Z

With zsh:

mylist=(${(f)"$(<filelist)"})
print -rl -- *(.^e_'(($mylist[(Ie)$REPLY]))'_)

It reads the lines of filelist in an array and then uses glob qualifiers/estring to glob/select only the file names not present in the array: the . selects only regular files (add D if your list contains dotfiles) and the negated ^e_'expression'_ further selects only those for which the expression returns false, i.e. if their name ($REPLY) is not an element of the array.
If you're happy with the result replace print -rl with rm to actually remove the files:

rm -- *(.^e_'(($mylist[(Ie)$REPLY]))'_)

To select & remove files recursively, use the */** glob with ${REPLY:t} glob modifier:

rm -- */**(.^e_'(($mylist[(Ie)${REPLY:t}]))'_)

eyoung100 · Accepted Answer · 2014-04-30 14:51:04Z

0

If you put the contents of the directory into a file like so:

cd <somedirectory>
ls >> filelist

Open filelist with a text editor, and remove all the files except the ones YOU WANT TO DELETE. That's bolded because it's the opposite approach to the answer above

Try this:

while read p || [[ -n $p ]]; 
echo $p
done < filelist

If you see your list of files output to the screen replace echo with rm -v, like so:

while read p || [[ -n $p ]]; 
rm -v $p
done < filelist

edited Apr 30, 2014 at 14:51

answered Apr 30, 2014 at 14:41

eyoung100

7,54025 silver badges54 bronze badges

Add a comment |

Ramesh · Accepted Answer · 2014-04-30 15:06:15Z

Run the below script.

Initially I am finding all the files that are present inside the directory and storing the output to another file all_files.
We have a file which has the list of files that should NOT be deleted (not_to_be_deleted_files).
I am adding the file names not_to_be_deleted_files and files_to_be_deleted to the end of not_to_be_deleted_files as we need these 2 files.
Now, I am finding the files that needs to be deleted using linux join command and redirecting the output to files_to_be_deleted file.
Now, in the final while loop I am reading all the file names in files_to_be_deleted and removing the files mentioned in that file name.

The script is as below.

find /home/username/directory -type f | sed 's/.*\///' > all_files
echo all_files >> not_to_be_deleted_files
echo not_to_be_deleted_files >> not_to_be_deleted_files
echo files_to_be_deleted >> not_to_be_deleted_files
join -v 1 <(sort all_files_listed) <(sort files_not_to_be_deleted) >   files_to_be_deleted
while read file
rm  "$file"
done < files_to_be_deleted

P.S: Probably, if you wish this to be saved as a script and run it, you can add the script name also using echo scriptname >> not_to_be_deleted_files.

Though it is not required, I prefer to do it because there will be no regrets later. I tested for a small set of files and it worked in my system. However, if you want to be sure, try in a test directory first and then remove the files in the original directory.

user unknown · Accepted Answer · 2018-03-08 16:06:24Z

0

Use the list as a source, to move all files in the list to a fresh, new and empty save-dir.
Compare the number of files in the list and the number of saved files.
If both match, delete all unsaved files with your favorite method.
Move the saved files back.

answered Mar 8, 2018 at 16:06

user unknown

10.8k3 gold badges37 silver badges59 bronze badges

Add a comment |

marlar · Accepted Answer · 2018-11-10 13:32:51Z

I went for a safer and much, much faster approach because I had 18.000 files in the list! I needed to clean up images in a large Drupal installation.

Deleting all the files that are not in the list is the same as keeping only those that are in the list. So I decided to actually copy the files from the list to another location, but copying 20 GB of files would take up too much space and be very slow as well. So the trick is to copy the files as hardlinks instead, using the -l option of cp. This takes up almost no space and is very fast. Additionally, since I needed to preserve the directory structure, I used the --parents option.

Here is an excerpt from my file list:

1px.png
misc/feed.png
modules/file/icons/x-office-presentation.png
modules/file/icons/x-office-spreadsheet.png
newsletter.png
sites/all/libraries/ckeditor/plugins/smiley/images/devil_smile.png
sites/all/libraries/ckeditor/plugins/smiley/images/regular_smile.png
sites/default/files/009313_PwC_banner_CBS_Observer_180x246px.jpg

So an example line would be, with temp being the destination:

cp -l --parents 'misc/feed.png' temp

This will create this structure:

temp
  misc
    feed.png

Note that the destinaton must be in the same file system as the source for hardlinks to work.

The next step is to construct the script:

sed -e "s,^,cp -l --parents '," -e "s,$,' /some/where/temp," filelist > newfilelist

Now, presuming you already created the empty dir /some/where/temp, you can copy the files like this:

sh newfilelist 2> missing_files

Note how errors end up in missing_files. The added bonus of this approach is that you will get a list of files from the original list that actually don't exist!

After running the script, temp will contain only those files that are in the file list, but without deleting anything and without taking up additional space. If you are satisfied with the result, you can delete all the orginal files including the subfolders.

Finally, move the files and folders from temp back to the original location.

For the 18.000 files it took only a few seconds.

Paulo Tomé · Accepted Answer · 2020-02-28 11:25:07Z

0

Safe, simple.

cd to directory.

Create a temp directory.

mv *.yourExlusionSelector.* ./temp
rm *
mv ./temp ./
rm -rf ./temp

done.

edited Feb 28, 2020 at 11:25

Paulo Tomé

3,8526 gold badges28 silver badges40 bronze badges

answered Feb 28, 2020 at 10:44

paradisaeidae

11 bronze badge

Welcome to the site. While your approach will work if the names on the list mentioned by the OP are the result of a simple pattern matching - which may very well be the case - please note that the OP stated that the filenames to exclude are stored in a specific file; you may want to expand your answer so as to read the exclusion patterns from that file instead of relying on one static pattern, or having to type-copy potentially multiple patterns to the console.

AdminBee
– AdminBee

2020-02-28 11:39:02 +00:00
Commented Feb 28, 2020 at 11:39

Add a comment |

Stack Exchange Network

Delete all files in a directory whose name do not match a line in a file list

8 Answers 8

Update: Demonstration:

You must log in to answer this question.

Linked

Hot Network Questions

Delete all files in a directory whose name do not match a line in a file list

8 Answers 8

Update: Demonstration:

You must log in to answer this question.

Linked

Related

Hot Network Questions