Edit - Software Engineering Stack Exchange

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Why don't you use a hashmap to check for duplicate links?

Pieter B
– Pieter B

2019-05-08 09:40:44 +00:00
Commented May 8, 2019 at 9:40
It will run out of memory, as there could be a billion links.

Lokasa Mawati
– Lokasa Mawati

2019-05-08 09:43:14 +00:00
Commented May 8, 2019 at 9:43
You need to run a profiler to see why your webcrawler is so slow.

Pieter B
– Pieter B

2019-05-08 09:53:32 +00:00
Commented May 8, 2019 at 9:53
The suggestion of hashmap of the addresses and compiling the database queries into a single call are good ones. If you're finding only a few thousand links and your starting source is something big like stackexchange, then there is probably some other error. You are ensuring that the link is new before going through its own links, right? And you're not pruning too much from the address?

user3685427
– user3685427

2019-06-07 13:48:12 +00:00
Commented Jun 7, 2019 at 13:48

Add a comment |

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author