2

I am trying to download multiple webpages using the WebClient class. When I try to download a website's html, a TargetInvocationException is thrown, and I do not know why it happens. Here is my code:

    public HashSet<string> DownloadWebpages(HashSet<string> urls)
    {
        HashSet<string> HTML = new HashSet<string>();

        for (int i = 0; i < urls.Count; i++)
        {
            WebClient client = new WebClient();
            client.DownloadStringCompleted += (s, e) =>
            {
                try
                {
                    lock (HTML)
                    {
                        HTML.Add(e.Result); //The exception happens on this line  
                    }
                }
                catch { }
            };
            client.DownloadStringAsync(new Uri(urls.ElementAt(i)));
        }
        return HTML;
    }

Is there any way to fix this. All I'm trying to do is download multiple webpages using async, trying to make it has fast as possible.

6
  • You're not holding a lock when adding to the hash set from multiple threads Commented Nov 24, 2022 at 22:31
  • @CodesInChaos I have tried using the lock, but the exception still happens. Do you know why an exception is being thrown? Thanks. Commented Nov 25, 2022 at 16:11
  • WebClient is an obsolete class and this shows why. If you want to make multiple calls use async/await and DownloadStringTaskAsync at least. Even better, use HttpClient instead Commented Nov 25, 2022 at 16:16
  • catch { } doesn't bode well. Why are you swallowing exceptions without any kind of logging? What is the detail of the exception? Please show the complete stack trace, including inner exceptions. Commented Nov 25, 2022 at 16:17
  • You're not waiting until the downloads are complete before you return from the function. Commented Nov 25, 2022 at 17:53

2 Answers 2

2

The TargetInvocationException is thrown, because the webclient is not able to download the website. Here is a test,

string html = new WebClient().DownloadString("https://www.siteth@tw!llcause an error!/randompage/");

This will cause an exception. So if you tried to download the same webpage with your code, it will cause an TargetInvocationException

Sign up to request clarification or add additional context in comments.

Comments

2

WebClient is an obsolete class replaced since 2012 by HttpClient. It was never built with HTTP APIs or thread safety in mind. It's easier to do what you want with HttpClient's GetStringAsync:

public async Task<HashSet<string>> DownloadWebpages(IEnumerable<string> urls)
{
    HashSet<string> HTML = new HashSet<string>();

    var client=new HttpClient();
    foreach (var url in urls)
    {
        var source=await client.GetStringAsync(url);
        HTML.Add(source);
    }
    return HTML;
}

Since .NET 6 you can even retrieve the URLs concurrently with Parallel.ForEachAsync. In this case you'd need a thread-safe collection to store them, eg a ConcurrentDictionary :

HttpClient _client=new HttpClient();

public async Task<ConcurrentDictionary<string,string>> DownloadWebpages(IEnumerable<string> urls)
{
    var HTML = new ConcurrentDictionary<string,string>();

    await Parallel.ForeachAsync(urls,async url=>{
    {
        var source=await client.GetStringAsync(url);
        HTML.Add(url,source);
    });
    return HTML;
}

HttpClient is thread-safe and meant to be reused.

If you absolutely have to use WebClient (why???) you can use DownloadStringTaskAsync. You won't be able to make concurrent calls though because WebClient isn't thread-safe.

public async Task<HashSet<string>> DownloadWebpages(IEnumerable<string> urls)
{
    HashSet<string> HTML = new HashSet<string>();

    var client=new WebClient();
    foreach (var url in urls)
    {
        var source=await client.DownloadStringTaskAsync(url);
        HTML.Add(source);
    }
    return HTML;
}

1 Comment

The OP created the client inside the loop, which should allow concurrent calls if done correctly.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.