5

I ran a script which acts on multiple "people", and creates output and error files for each. let's say something like this:

output_alice.txt
error_alice.txt
output_bob.txt
error_bob.txt
.
.
.

I want a command that will scan all the error files (error_<name>.txt) and echo the ones that have had something written to them (vs being empty), as a quick way to identify which "people" the script exited with an error for. Is there an easy way to do this? I know how to use grep to do this for a string, e.g. grep -r <substring> ., but not how to check if there is anything at all.

0

5 Answers 5

15

Note that bash is not a terminal, it's one of many shells, which are interpreters for some kinds of programming languages specialised in running commands. Like most applications it can work with its input/output connected to a terminal device or any other type of file.

To list the files named error_anything.txt in the current working directory that contain at least one line, in the language of bash and most other Unix shells, you can do:

grep -l '^' error_*.txt

Where ^ is a regular expression that matches at the start of the subject, subject being each line in the file for grep.

For those with at least one non-empty text line:

grep -l . error_*.txt

Where . matches any single character. Beware for files encoded in a charmap other than that of the locale, it could fail to match non-empty lines if their contents cannot be decoded as text.

Also note that not all grep implementations will report files with only one unterminated line (one missing the line delimiter like in the output of printf invalid-text-as-missing-the-last-newline).

Another approach is to look for files that contain at least one byte:

find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c

Which also has the benefit of ignoring files that are not of type regular (such as directories, sockets...)

Or with the zsh shell:

print -rC1 -- error_*.txt(N-.L+0)

Where - acts like -L so that for symlinks, the size and type of their targets be considered, . is the equivalent of -type f and L+0 of -size +0c (and N for Nullglob so as not to report an error if there's no matching file).

That has the benefit of not including the ./ prefix and of working even if the user name cannot be decoded as text in the locale and of giving you a (lexically by default) sorted list.

That one you can extend to only print the user name (the part of the root name of the file after the first _) with:

{}{ print -rC1 -- ${@#*_}; } error_*.txt(N-.L+0:r)

To list error files that have been modified since you ran a command, you can use the -newer predicate of find and compare with a file that have been touched just before running your command:

touch .before
my-command-that-may-write-to-error-files
find -L . ! -name . -prune -name 'error_*.txt' -type f -size +0c -newer .before

In zsh, you can replace the find command with:

print -rC1 -- error_*.txt(N-.L+0e['[[ $REPLY -nt .before ]]'])

With some find implementations, you can replace ! -name . -prune with -mindepth 1 -maxdepth 1 though -maxdepth 1 would also work here as the file at depth 0 (.) doesn't match the other criteria (it matches neither -name 'error_*.txt' nor -type f) anyway.

With the GNU implementation of date and find (that's also the find implementation that introduced the -maxdepth predicate), you can avoid having to create that .before file by doing:

before=$(date +'@%s.%N')
my-command-that-may-write-to-error-files
find -L . -maxdepth 1 -name 'error_*.txt' -type f -size +0c -newermt "$before"

With zsh, you can replace the before=$(date +'@%s.%N') with print -Pv before '@%D{%s.%N}' or before=${(%):-@%{%s.%N}D} or before=@$EPOCHREALTIME (after zmodload zsh/datetime); you could again avoid the call to find by using glob qualifiers, and even the temporary variable by using an anonymous function again, but that becomes significantly involved:

zmodload zsh/stat
zmodload zsh/datetime
() {
  my-command-that-may-write-to-error-files
  print -rC1 error_*.txt(N-.L+0e['
    stat -F %s.%N -A2 +mtime -- $REPLY && (( $2 > $1 )) '])
} $EPOCHREALTIME

Beware though that on Linux at least, even though the system and filesystems support nanosecond precision, granularity is much less. You can even find that modification time is set upon modification to some value that predates the initial call to date or reference to $EPOCHREALTIME so those approaches may not work for commands that take less than a centisecond to run. Dropping Nanoseconds and replacing > with >= or -newer with ! -older (if your find implementation supports it which is unlikely) may be a better approach.

4
  • thank you! In your answer you mention "Note that some grep implementations will also report files that have a non-line." What is a non-line? Commented Aug 11, 2023 at 12:21
  • @abra see edit. Lines are sequences of 0 or more characters delimited by a newline characters and that don't have a length greater than the LINE_MAX limit in byte. So bytes that don't form characters, or overlong lines, or the bytes if any after the last newline character in the file would form non-lines. Commented Aug 11, 2023 at 12:26
  • thanks for the clarification. I marked your answer as accepted. The command "grep -l . error_*.txt" is what I needed. Thanks again! Commented Aug 11, 2023 at 12:28
  • What a wonderful tutorial for what could have been a brief answer. A lot of fixes for potential issues to watch out for. Commented Aug 16, 2023 at 3:57
10

GNU find offers the non-POSIX option to list empty files, simply negate that test:

find /path/to/dir -type f -name 'error_*.txt' ! -empty

For not searching in subdirectories add -maxdepth 1 after the path.

In POSIX find checking for a file size of 0 would work:

find /path/to/dir -type f -name 'error_*.txt' ! -size 0
3

Just grep for ., which means any character. Empty files have no characters, so searching for . will show non-empty files. For example:

$ touch empty1 empty2 empty3
$ echo "not empty!" > non_empty
$ ls -l 
total 4
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty1
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty2
-rw-r--r-- 1 terdon terdon  0 Aug 11 13:13 empty3
-rw-r--r-- 1 terdon terdon 11 Aug 11 13:13 non_empty

Now, we grep:

$ grep -- . ./*
non_empty:not empty!

And, to get names only:

$ grep -l -- . ./*
non_empty

Note that grep . will not find a file that has nothing empty line(s) (one or more \n characters). For that, you should use grep '^' as suggested in Stéphane's answer.

0
1

Another one-line method for searching for non-empty files.

$ for f in `ls error_*.txt`; do [ -s "${f}" ] && echo ${f} ; done

Where

-s FILE , True if file exists and is not empty.

Explanation

Loop through list of all files in current directory, that matches error_*.txt . Function checks "-s" if file exists and contains something, if so then display the file name.

7
  • 4
    You should never do for `f in `ls error_*.txt` . Just use for f in error_*.txt directly. Commented Aug 12, 2023 at 4:21
  • 1
    I'd recommend reading Why is printf better than echo? and When is double-quoting necessary? and Why you shouldn't parse the output of ls(1) Commented Aug 12, 2023 at 5:46
  • @muru So I'm curious, what is the background on why no quote character ? Commented Aug 12, 2023 at 6:56
  • 1
    @Iain4D see the last link from Stéphane's comment above. Commented Aug 12, 2023 at 8:41
  • @muru Hmm.. thanks for the link reference. However looks like it is for double-quotes. One of the comments there linked to the always use quotes so I wonder if that implies more applicable to for loops ? Commented Aug 16, 2023 at 21:35
1

GNU sed only. Like an alternative for the grep command:

sed -sn 1F error_*.txt

! I have not met the command F in the man pages, but it works. In particular, I insert file names in the first line of nonempty files sed -i 1F *

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.