Skip to main content
Updated based on discussion
Source Link
afxdesign
  • 133
  • 1
  • 6

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100(100/Total Pages) * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Query db to get total processed e.g. selectSelect count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, afterDo not start crawling your sub_links until all you cannot guess the actual density of keywordsmaster_links are crawled - this will also allow you to avoid duplicate url crawls and linksshould have a minimal impact on the total time.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Query db to get total processed e.g. select count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level.

(100/Total Pages) * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Select count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. Do not start crawling your sub_links until all the master_links are crawled - this will also allow you to avoid duplicate url crawls and should have a minimal impact on the total time.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Updated based on discussion
Source Link
afxdesign
  • 133
  • 1
  • 6

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Query db to get total processed e.g. select count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Total Pages = Select count() from master_links
Pages Processed = Query db to get total processed e.g. select count(
) from master_links where processed=true. When you have processed the page simply set the flag in the db.

(This could similarly be done by populating an array with your db values and using the index value as your pages processed)

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level:
e.g. Master Links 40% complete
or
e.g. Master Links 100%
Sub Links 49.8%

Updated to provide more clarification
Source Link
afxdesign
  • 133
  • 1
  • 6

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level: 
e.g. Master Links 40% complete 
or 
e.g. Master Links 100% 
Sub Links 49.8%

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

Interative image

Your output would show percentage complete of that level: e.g. Master Links 40% complete or e.g. Master Links 100% Sub Links 49.8%

This sounds similar in principle to broken link checkers for which there are many around the internet. I would suggest running a few of the free ones to see how they approach tracking the progress.

Although it would be near impossible to track % complete accurately due to an undetermined number of links and keywords it is possible to show a rough status via depth. For example the first depth would be the url/s processed from the top level. You can easily show status for that depth by showing the % grabbed pages of the total from the parent.

The squares in the diagram below represent the pages which need to be processed. Inside each box is the percentage complete if you were processing them left to right. This is for illustrative purposes the percentage would be based on this:

100/Total Pages * Pages Processed = % current status

Note: You can only get the status for each level. But it should still give you enough info to indicate the progress, after all you cannot guess the actual density of keywords and links...

Interative image

Your output would show percentage complete of that level: 
e.g. Master Links 40% complete 
or 
e.g. Master Links 100% 
Sub Links 49.8%

Tweaked based on users response
Source Link
afxdesign
  • 133
  • 1
  • 6
Loading
Source Link
afxdesign
  • 133
  • 1
  • 6
Loading