0

Need to search strings in (.gz) zipped folder with files in it.

Sample Folder -

PROD_009_010919_0110.tar.gz

Files inside - PROD_009_010919.log01, PROD_009_010919.log02, PROD_009_010919.log03.......etc

Command - zgrep -ia *123456* PROD_* is not fetching the expected results.

the expected output - i should search the string '123456' in all the files in the zipped folders and display the string containing files.

8
  • What do you want to do? Do you want to extract a piece of text from one of the files in the archive, or do you want to test whether a particular file is part of the archive? In short, what is the expected result? Commented Jan 11, 2019 at 12:30
  • gzip does not zip folders, only files so I suppose that you are using another zipper apart from gz right? Have you checked with gzip -l the content of of your gz? Commented Jan 11, 2019 at 12:37
  • @Dasel, based on the filename it seems to be tar-ed before gzip-ed Commented Jan 11, 2019 at 12:43
  • 1
    Thanks @RomeoNinov; I did not saw the name; in that case we need first to know which is expected, as otherwise is difficult to debug. Anyway it would be much easier to do tar -ztvf PROD_009_010919_0110.tar.gz and after that use a normal grep to search the wanted result, but of course the ideal would be to have a sample of the expected output. Commented Jan 11, 2019 at 12:54
  • the expected result need to be - i should get the string 1234536 containing list of files. zgrep -ia 1234536 PROD_* Commented Jan 11, 2019 at 13:28

1 Answer 1

1

This prints out the list of files inside the tarball matching the given pattern:

tar --ignore-command-error -xvf PROD_009_010919_0110.tar.gz --to-command="grep -FH 1234536 -" | grep -B1 --no-group-separator '(standard input)' | grep -v '(standard input)'

The --to-command option extracts each file and sends to the standard input of the grep command. The -v option lists each file as they're processed.

The --ignore-command-error is used to ignore the exit status when grep cannot find a match. Because of the -H option (print filename) used with the grep command, each matching line is prefixed with a '(standard input)' string.

This results in output of the following kind from the tar command:

file1
file2
(standard input): <matched lines from file2>
file3
(standard input): <matched lines from file3>

Piping this output allows the two grep commands to extract only the file names which are immediately followed by the '(standard input)' string on the next line. This processing could probably be improved using a regex to match the pattern instead of the two sequential grep commands I have used here.

The resulting output in this case will be:

file2
file3
4
  • We still don't actually know whether the user wants to grep filenames or file contents. Commented Jan 11, 2019 at 15:01
  • File content Kusal. Commented Jan 11, 2019 at 15:30
  • @Kusalananda I was going by the OP's earlier comment that stated 'i should get the string 1234536 containing list of files'. In any case, the OP has just clarified with another comment that they want to grep through the contents and then list the files containing matches. Commented Jan 11, 2019 at 15:53
  • @PraveenKP To reply to a specific user, you should use the '@username' notation so that they get a notification for your comment. In any case, I believe my answer should work as per your requirements. Let me know if you see any problems. Commented Jan 11, 2019 at 15:56

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.