2

I have a directory with lots of json and pdf files that are named in a pattern. I am trying to filter the files on name with the following pattern \d{11}-\d\.(?:json|pdf) in the command. For some reason it is not working. I believe it is due the fact that the xargs take the arguments one big line of string or when the input is split there is some whitespace, \n or null character.

ls | xargs -d '\n' -n 1 grep '\d{11}-\d\.(?:json|pdf)'

if I try just this ls | xargs -d '\n' -n 1 grep '\d' It selects file names with digits in them, as soon as I specify the multiplicity regex, nothing matches.

7
  • 1
    are you planning to filter the list of filenames, or the contents of the files? Because running ... |xargs grep $pattern would run grep $pattern file1 file2 ..., and look at the contents of the files Commented Aug 28, 2021 at 19:22
  • 1
    It's unclear what you want to achieve. Do you just want to list the filenames? What are some examples of filename that you want to list and that you don't want to list? Commented Aug 28, 2021 at 19:22
  • 2
    You also don't want to parse the output of ls. You haven't clarified what the objective is, but if you are starting with wanting to find files that match a certain pattern(s), you are better off using something along the lines of find /path/to/directory -type f -name *:json -o -name *pdf Commented Aug 28, 2021 at 19:27
  • @ilkkachu Yes. That would work as well. More clarity is needed on what is expected though. Commented Aug 28, 2021 at 19:40
  • @ilkkachu No, I am not looking inside the files, but rather on the filename. I am trying to apply the pattern on the filenames and filtering it. Commented Aug 28, 2021 at 22:30

2 Answers 2

6

First, ls | xargs grep 'pattern' makes grep look for occurrences in contents of files listed by ls, not in list of filenames. To look for filenames it should be enough to do:

ls | grep 'pattern'

Second, grep '\d{11}-\d\.(?:json|pdf)' would work only with GNU grep and -P option. Use the following syntax instead - it works with GNU, busybox and FreeBSD implementations of grep:

ls | grep -E '[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'

Third, parsing ls is not a good idea. Use GNU find:

find . -maxdepth 1 -regextype egrep -regex '.*/[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'

or FreeBSD find:

find -E . -maxdepth 1 -regex '.*/[[:digit:]]{11}-[[:digit:]]\.(json|pdf)'
1
  • Thanks that worked. The one I was missing; the -P option. Commented Aug 28, 2021 at 22:39
1

You don't need any of that complexity. Just use a shell glob. This one is for shells such as bash that understand {x,y} braced alternatives:

ls *[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9].{json,pdf}

If you want to do something with the matched files, don't take the output of ls but just use the glob to iterate across the files directly.

2
  • 1
    that is a lot of digit regex :). I was looking for a more consistent regex based solution as the directory has a lot of files. Thanks for you reply. Commented Aug 28, 2021 at 22:47
  • 1
    It's not a regex; it's a glob used directly by the shell. Try it Commented Aug 28, 2021 at 23:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.