5

Can using bash's globstar (**) operator cause an out of memory error? Consider something like:

for f in /**/*; do printf '%s\n' "$f"; done

When ** is being used to generate an enormous list of files, assuming the list is too large to fit in memory, will bash crash or does it have a mechanism to handle this?

I know I've run ** on humongous numbers of files and haven't noticed a problem, so I am assuming that bash will use something like temporary files to store some of the list as it is being generated. Is that correct? Can bash's ** handle an arbitrary number of files or will it fail if the file list exceeds what can fit in memory? If it won't fail, what mechanism does it use for this? Something similar to the temp files generated by sort?

7
  • 1
    There is a tag for "globstar" ?? :) Commented Mar 15, 2021 at 12:41
  • I just created it, @AdminBee. It seemed useful since there are various **-specific questions that can be asked. Do you think it isn't helpful? Commented Mar 15, 2021 at 12:45
  • 3
    Possibly related: unix.stackexchange.com/a/171347/237982 Commented Mar 15, 2021 at 12:55
  • 2
    globstar is the (very weirdly named) option David Korn picked for enabling the recursive-globbing feature it copied from zsh over 10 years later, and bash eventually copied as well another decade later. Several shells have added zsh-style recursive-globbing support, not all with that misnamed globstar option. Can we make the tag recursive-glob instead (and maybe a globstar alias to it for the ksh93/bash/tcsh users?). See also The result of ls * , ls ** and ls *** Commented Mar 15, 2021 at 14:06
  • 1
    @jrw32982 no, that isn't what I mean. I know it generates all file names first, my question was whether it also had a mechanism (such as writing partial file lists to temp files) to avoid running out of memory. Commented Mar 17, 2021 at 19:42

1 Answer 1

10

Yes, it can, and this is explicitly accounted for in the globbing library:

  /* Have we run out of memory?  */
  if (lose)
    {
      tmplink = 0;

      /* Here free the strings we have got.  */
      while (lastlink)
        {
          /* Since we build the list in reverse order, the first N entries
             will be allocated with malloc, if firstmalloc is set, from
             lastlink to firstmalloc. */
          if (firstmalloc)
            {
              if (lastlink == firstmalloc)
                firstmalloc = 0;
              tmplink = lastlink;
            }
          else
            tmplink = 0;
          free (lastlink->name);
          lastlink = lastlink->next;
          FREE (tmplink);
        }

      /* Don't call QUIT; here; let higher layers deal with it. */

      return ((char **)NULL);
    }

Every memory allocation attempt is checked for failure, and sets lose to 1 if it fails. If the shell runs out of memory, it ends up exiting (see QUIT). There’s no special handling, e.g. overflowing to disk or handling the files that have already been found.

The memory requirements in themselves are small: only directory names are preserved, in a globval structure which forms a linked list, storing only a pointer to the next entry and a pointer to the string.

7
  • 1
    Ah, OK. This was prompted by the comment thread under this SO answer where the OP was using for f in /home/**/* and the answer suggested splitting that into for d in /home/*; do for f in $d/**; do... under the assumption that building separate lists will get around any issues with bash failing because it tries to build one single list. If I understand correctly, you are saying that this is a valid assumption and we can have a case where ** will fail but multiple, smaller ** will not. Commented Mar 15, 2021 at 13:36
  • 2
    Yes, that would be one way of reducing the memory requirement. Commented Mar 15, 2021 at 13:53
  • 1
    In any case, it's not specific to recursive globbing. Any glob can exhaust memory if they generate enough files. See the /*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/* given as an example at Security implications of forgetting to quote a variable in bash/POSIX shells for instance. Commented Mar 15, 2021 at 14:09
  • @StéphaneChazelas yes, my question was prompted by a (mistaken) idea I had that bash's ** behaved differently and would use temp files to store intermediate lists to avoid running out of memory. Commented Mar 15, 2021 at 14:41
  • 1
    In fact, malloc tends not to fail on Linux systems. Your shell is likely to get killed by OOM killer should it run out of memory. Commented Mar 15, 2021 at 22:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.