Skip to main content
5 of 5
Updated based on discussion
afxdesign
  • 133
  • 1
  • 6

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level.

(100/Total Pages) * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Select count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. Do not start crawling your sub_links until all the master_links are crawled - this will also allow you to avoid duplicate url crawls and should have a minimal impact on the total time.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

afxdesign
  • 133
  • 1
  • 6