Thanks for the tips everyone
I ended up doing this:
w3m -dump -T text/html "$thread" | grep -i -E -o 'File\:+([[:print:]]*)\.(jpg|png|webm|gif)'
w3m cleans the code and then iI can grep for the file names. (I need the literal "File:" part to distinguish a linked file from its title). I do need [[:print:]] because it catches most whitespace, unicode chars and other printables.
which works as I intended (though I still have to figure out how to prevent overwriting files with same name but that's another day's battle)