Pymeta will search the web for files on a domain to download and extract metadata. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.
Azure Function & supporting framework to take PDF files, extract metadata using regular expressions, store the results in DocumentDB to be indexed and searchable by Azure Search.