3

I want to find files in a directory and identify by their mimetype, not by the extension of the files.

I'am using this command to determine the mime type:

% find . -type f -print0 | xargs -0 -I{} file --mime-type {}
./foo
bar.png: image/png
./OWoHp.png: image/png
./J7ZwV.png: image/png
./foo.txt: inode/x-empty
./bar: inode/x-empty
./hUXnc.png: image/png

The first file has a newline in the filename:

% ls foo$'\n'bar.png
foo?bar.png

That's ok and the file should not be renamed.

With the next command I want to filter all files that are not images.

% find . -type f -print0 | xargs -0 -I{} file --mime-type {} | awk -F$"\0" -F": " '/image/ {print $1}'
bar.png
./OWoHp.png
./J7ZwV.png
./hUXnc.png

and identify their sizes:

% find . -type f -print0 | xargs -0 -I{} file --mime-type {} | awk -F$"\0" -F":" '/image/ {print $1}' | xargs -I{} identify -format "%[fx:w*h] %i\n" {}
identify: unable to open image `bar.png': No such file or directory @ error/blob.c/OpenBlob/2709.
identify: unable to open file `bar.png' @ error/png.c/ReadPNGImage/3922.
26696 ./OWoHp.png
47275 ./J7ZwV.png
37975 ./hUXnc.png

But that does not work because there is no file with the name bar.png. The correct name is

./foo
bar.png

with a newline in the name.

3
  • 2
    Your main issue here, I think, is that file does not preserve the null terminators. It has a --print0 (or -0) option of its own, however that places a null character after the name, rather than at the end of each name: mimetype output line. So whatever you do, awk will see newline separated records. Commented Jun 28, 2015 at 13:09
  • 1
    @don_crissti I disagree renaming the question, I think it's about the quoting issue, and handling images is just an example. Commented Jun 29, 2015 at 3:18
  • @don_crissti Hmm... focussing not on the question title, but the first line, I do understand it is an XY problem indeed. So I do not agree to disagree ;) I rather plead neutral and claim the question is inconsistent. As far as I can see, editing to make one of first line and title match the other would both break something, so we leave it at that? Commented Jun 29, 2015 at 11:38

3 Answers 3

4

I think your best bet will be to use a shell loop instead of xargs: Then you can control how commands are sent the filename argument.

find . -type f -print0 | 
while IFS= read -rd "" filename; do
    type=$( file --brief "$filename" )
    if [[ $type == *image* ]]; then
        identify -format "%[fx:w*h] %i\n" "$filename"
    fi
done
2

You could use the -exec sh -c '...' construct with find:

find . -type f -exec sh -c 'file --brief --mime-type "$0" | \
grep -q ^image/ && identify -format "%[fx:w*h] %i\n" "$0"' {} \;

or with exiftool:

exiftool -q -if '$mimetype =~ /image/' -p '$megapixels $directory/$filename' -r .
2
  • The exiftool gives a Warning: [Minor] Tag 'megapixels' not defined - ./hUXnc.png ;) Commented Jun 28, 2015 at 16:51
  • @A.B. - that means megapixels tag is missing (odd, as it's a composite tag derived from imagesize, does it print the size if you run exiftool -p '$imagesize' hUXnc.png ?); you could add -f to the command so that instead of that message a dash is printed for any missing tag but that's not the expected result... Commented Jun 28, 2015 at 20:45
2

As steeldriver pointed out, your problem is not awk, it's file. There is no NUL in the input you are giving to awk because file ate it. I would do this whole thing in the shell instead:

find . -type f -print0 | while IFS= read -r -d '' file; do 
    file --mime-type "$file" | grep -qP "\bimage/" && 
        printf '%s %s\0' $(identify -format '%[fx:w*h]' "$file") "$file";
done | sort -gz | tr '\0' '\n'
256 ./file 10
256 ./file 15
484 ./file 16
576 ./file 11
576 ./file 17
1024 ./file 12
1024 ./file 19
2304 ./file 13
5625 ./file 14
15190 ./file 2
15680 ./file 1
16384 ./file 9
65536 ./file 18
145200 ./file 0
183531 ./file 6
364807 ./file
3
364807 ./file 4
364807 ./file 5
388245 ./file 8
550560 ./file 7

I included sort since I assume you're trying to improve your answer here. The example above was run on file names with spaces and one (file\n3 with a newline). For some reason, identify won't print \0-terminated lines so I used printf instead.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.