I have downloaded an html file autogenerated by a script on a webpage. The file contains multiple links, including links to images I am trying to extract the full names of the images, for example
<a href="000000.jpg" title="image name.jpg" target="_blank">Image name.jpg</a>
from the above I want to get "Image name.jpg" stored in a file. Since there are hundreds of these, I parse the file and store each name as it comes up with the following command:
grep -i -E -o "target=\"_blank\">([[:graph:]]*)\.(jpg|png|gif|webm)" "$thread" | cut -f 2 -d '>' | sed 's/ /_/g' - > "$names"
where "$thread" is the name of the html file, "$names" is the list of filenames as output.  I use "cut" to remove the 'target="_blank">' portion, then convert the spaces to underscores. 
Since there are several other links in the file, I specify the extensions to grab (images and webm). everything else should be ignored. I got it to the point where it is grabbing these links only, but then it misses some.
Some files contain spaces and non-alphanumeric characters.  If I use [[:print:]] which should cover all these cases i get nothing, or I get a bit of the <head> portion of html and nothing else.  If I use [[:graph:][:space:]], I also get nothing.  If I just use [[:graph:]] (as above) or [[:alnum:][:punct:]] I can get files with alphanumeric/other characters (like "filenamewith(parenthesis).jpg"), but not spaces, and the reverse is also true, [[:alnum:][:space:]] works but omits the other printable characters ("file name with spaces.jpg" works but not "with(parenthesis,comma or other.jpg"). 
Supposedly [[:print:]] covers all cases but I don't get what I need, which if I'm understanding correctly, 
grep -E -o should only match (per the case above) :
*.jpg *.png *.gif or *.webm
I have tried grep with and without -E/-o/-e in different variations.
Any ideas? I am using Arch Linux, grep 2.20, bash 4.3.18
