How can you, with bash, check all files in a directory to see which ones (if any) something was written to?

Question

I ran a script which acts on multiple "people", and creates output and error files for each. let's say something like this:

output_alice.txt
error_alice.txt
output_bob.txt
error_bob.txt
.
.
.

I want a command that will scan all the error files (error_<name>.txt) and echo the ones that have had something written to them (vs being empty), as a quick way to identify which "people" the script exited with an error for. Is there an easy way to do this? I know how to use grep to do this for a string, e.g. grep -r <substring> ., but not how to check if there is anything at all.

Stéphane Chazelas · Accepted Answer · 2023-08-14 07:18:01Z

Note that bash is not a terminal, it's one of many shells, which are interpreters for some kinds of programming languages specialised in running commands. Like most applications it can work with its input/output connected to a terminal device or any other type of file.

To list the files named error_anything.txt in the current working directory that contain at least one line, in the language of bash and most other Unix shells, you can do:

grep -l '^' error_*.txt

Where ^ is a regular expression that matches at the start of the subject, subject being each line in the file for grep.

For those with at least one non-empty text line:

grep -l . error_*.txt

Where . matches any single character. Beware for files encoded in a charmap other than that of the locale, it could fail to match non-empty lines if their contents cannot be decoded as text.

Also note that not all grep implementations will report files with only one unterminated line (one missing the line delimiter like in the output of printf invalid-text-as-missing-the-last-newline).

Another approach is to look for files that contain at least one byte:

find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c

Which also has the benefit of ignoring files that are not of type regular (such as directories, sockets...)

Or with the zsh shell:

print -rC1 -- error_*.txt(N-.L+0)

Where - acts like -L so that for symlinks, the size and type of their targets be considered, . is the equivalent of -type f and L+0 of -size +0c (and N for Nullglob so as not to report an error if there's no matching file).

That has the benefit of not including the ./ prefix and of working even if the user name cannot be decoded as text in the locale and of giving you a (lexically by default) sorted list.

That one you can extend to only print the user name (the part of the root name of the file after the first _) with:

{}{ print -rC1 -- ${@#*_}; } error_*.txt(N-.L+0:r)

To list error files that have been modified since you ran a command, you can use the -newer predicate of find and compare with a file that have been touched just before running your command:

touch .before
my-command-that-may-write-to-error-files
find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c -newer .before

In zsh, you can replace the find command with:

print -rC1 -- error_*.txt(N-.L+0e['[[ $REPLY -nt .before ]]'])

With some find implementations, you can replace ! -name . -prune with -mindepth 1 -maxdepth 1 though -maxdepth 1 would also work here as the file at depth 0 (.) doesn't match the other criteria (it matches neither -name 'error_*.txt' nor -type f) anyway.

With the GNU implementation of date and find (that's also the find implementation that introduced the -maxdepth predicate), you can avoid having to create that .before file by doing:

before=$(date +'@%s.%N')
my-command-that-may-write-to-error-files
find -L . -maxdepth 1 -name 'error_*.txt' -type f -size +0c -newermt "$before"

With zsh, you can replace the before=$(date +'@%s.%N') with print -Pv before '@%D{%s.%N}' or before=${(%):-@%{%s.%N}D} or before=@$EPOCHREALTIME (after zmodload zsh/datetime); you could again avoid the call to find by using glob qualifiers, and even the temporary variable by using an anonymous function again, but that becomes significantly involved:

zmodload zsh/stat
zmodload zsh/datetime
() {
  my-command-that-may-write-to-error-files
  print -rC1 error_*.txt(N-.L+0e['
    stat -F %s.%N -A2 +mtime -- $REPLY && (( $2 > $1 )) '])
} $EPOCHREALTIME

Beware though that on Linux at least, even though the system and filesystems support nanosecond precision, granularity is much less. You can even find that modification time is set upon modification to some value that predates the initial call to date or reference to $EPOCHREALTIME so those approaches may not work for commands that take less than a centisecond to run. Dropping Nanoseconds and replacing > with >= or -newer with ! -older (if your find implementation supports it which is unlikely) may be a better approach.

thank you! In your answer you mention "Note that some grep implementations will also report files that have a non-line." What is a non-line? — abra
– abra, Commented Aug 11, 2023 at 12:21
@abra see edit. Lines are sequences of 0 or more characters delimited by a newline characters and that don't have a length greater than the LINE_MAX limit in byte. So bytes that don't form characters, or overlong lines, or the bytes if any after the last newline character in the file would form non-lines. — Stéphane Chazelas
– Stéphane Chazelas, Commented Aug 11, 2023 at 12:26
thanks for the clarification. I marked your answer as accepted. The command "grep -l . error_*.txt" is what I needed. Thanks again! — abra
– abra, Commented Aug 11, 2023 at 12:28
What a wonderful tutorial for what could have been a brief answer. A lot of fixes for potential issues to watch out for. — Mark Stewart
– Mark Stewart, Commented Aug 16, 2023 at 3:57

FelixJN · Accepted Answer · 2023-08-11 12:23:52Z

10

GNU find offers the non-POSIX option to list empty files, simply negate that test:

find /path/to/dir -type f -name 'error_*.txt' ! -empty

For not searching in subdirectories add -maxdepth 1 after the path.

In POSIX find checking for a file size of 0 would work:

find /path/to/dir -type f -name 'error_*.txt' ! -size 0

edited Aug 11, 2023 at 12:23

answered Aug 11, 2023 at 12:18

FelixJN

14.1k2 gold badges36 silver badges55 bronze badges

Add a comment |

terdon · Accepted Answer · 2023-08-11 12:29:00Z

Just grep for ., which means any character. Empty files have no characters, so searching for . will show non-empty files. For example:

$ touch empty1 empty2 empty3
$ echo "not empty!" > non_empty
$ ls -l 
total 4
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty1
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty2
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty3
-rw-r--r-- 1 terdon terdon 11 Aug 11 13:13 non_empty

Now, we grep:

$ grep -- . ./*
non_empty:not empty!

And, to get names only:

$ grep -l -- . ./*
non_empty

Note that grep . will not find a file that has nothing empty line(s) (one or more \n characters). For that, you should use grep '^' as suggested in Stéphane's answer.

Iain4D · Accepted Answer · 2023-08-12 02:47:43Z

1

Another one-line method for searching for non-empty files.

$ for f in `ls error_*.txt`; do [ -s "${f}" ] && echo ${f} ; done

Where

-s FILE , True if file exists and is not empty.

Explanation

Loop through list of all files in current directory, that matches error_*.txt . Function checks "-s" if file exists and contains something, if so then display the file name.

answered Aug 12, 2023 at 2:47

Iain4D

235 bronze badges

4

You should never do for `f in `ls error_*.txt` . Just use for f in error_*.txt directly.

muru
– muru

2023-08-12 04:21:13 +00:00
Commented Aug 12, 2023 at 4:21
1

I'd recommend reading Why is printf better than echo? and When is double-quoting necessary? and Why you shouldn't parse the output of ls(1)

Stéphane Chazelas
– Stéphane Chazelas

2023-08-12 05:46:58 +00:00
Commented Aug 12, 2023 at 5:46
@muru So I'm curious, what is the background on why no quote character ?

Iain4D
– Iain4D

2023-08-12 06:56:19 +00:00
Commented Aug 12, 2023 at 6:56
1

@Iain4D see the last link from Stéphane's comment above.

muru
– muru

2023-08-12 08:41:07 +00:00
Commented Aug 12, 2023 at 8:41
@muru Hmm.. thanks for the link reference. However looks like it is for double-quotes. One of the comments there linked to the always use quotes so I wonder if that implies more applicable to for loops ?

Iain4D
– Iain4D

2023-08-16 21:35:00 +00:00
Commented Aug 16, 2023 at 21:35

| Show 2 more comments

nezabudka · Accepted Answer · 2023-08-12 09:45:32Z

1

GNU sed only. Like an alternative for the grep command:

sed -sn 1F error_*.txt

! I have not met the command F in the man pages, but it works. In particular, I insert file names in the first line of nonempty files sed -i 1F *

answered Aug 12, 2023 at 9:45

nezabudka

2,4567 silver badges15 bronze badges

Add a comment |

Stack Exchange Network

How can you, with bash, check all files in a directory to see which ones (if any) something was written to?

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

How can you, with bash, check all files in a directory to see which ones (if any) something was written to?

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions