0

Here is my problem statement :

There is a folder with many html and text files. I need to recursively go through each one of them and find all kinds of file extensions referenced in these html/text files like .jpg, .tif, .png etc

The problem is I don't have a defined list of the extensions I want to search for.

What would be the best way to achieve this using a shell script ?

Coming up with a Reg-ex which would essentially search for all occurrences of a dot followed by 3 or 4 letters, and filtering out the ones which end with a space or a comma, or a quote etc ??

Any suggestions would be helpful.

1 Answer 1

4

Keeping in mind that HTML is not a regular language, you could probably at least narrow it down with:

grep -Ro '[a-zA-Z0-9]+\.[a-zA-Z0-9]{1,4}' *

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.