This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.
Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

In the diagram I have placed the % complete of the depth for each page grabbed. Reading your example you would start with more than one page, this is the same in principle just divide 100 by the number of submitted urls to get the % complete for each page processed.
I would also consider specifying a maximum depth as you would be surprised at how many websites have inadvertently created almost exponentially growing websites due to bad code.
With a maximum depth and an average depth assumption you could calculate a rough total percentage complete.