1

I'm setting up a scraper using NodeJS, and I'm having a hard time figuring out the right way to pass data around when using async.parallel.

Here's the batch function, which receives the list of zip codes in an array inside of the zip_results object. I'm trying to setup the array asyncTasks as an array of functions to be run by async. The function I want called for each zip code is Scraper.batchOne, and I want to pass it a zip code and the job version. Right now, the function is called immediately. I tried wrapping the call to Scraper.batchOne in an anonymous function, but that lost the scope of the index variable i and always sent in undefined values.

How can make it so that the function is passed to the array, along with some parameters?

// zip_results: {job_version: int, zip_codes: []}
Scraper.batch = function (zip_results) {
    //tasks - An array or object containing functions to run, each function
    //is passed a callback(err, result) it must call on completion with an
    //error err (which can be null) and an optional result value.
    var asyncTasks = [], job_version = zip_results.job_version;
    for (var i=0; i < zip_results['zip_codes'].length; i++) {
        asyncTasks.push(Scraper.batchOne(zip_results['zip_codes'][i], job_version));
    }
    // Call async to run these tasks in parallel, with a max of 2 at a time
    async.parallelLimit(asyncTasks, 2, function(err, data) { console.log(data); });

};

2 Answers 2

4

Why don't you use async.eachLimit instead? (With async.parallel you would need to use bind / apply techniques)

async.eachLimit(zip_results['zip_codes'], 2, function(zip, next) {
    Scraper.batchOne(zip, zip_results.job_version));
    return next();
}, function(err) {
     // executed when all zips are done
});
Sign up to request clarification or add additional context in comments.

2 Comments

It looks like this starts off fine, but after the first two results are processed, the process just hangs
I edited the solution, of course we need to call the callback (next() in the above example) when we are done with the item. If your batch is working asynchronous you will need to pass the next as a parameter into the batchOne function and call it when you are done.
3

You can do a self invoking anonymous function and pass the parameters that you want to retain after the method is called like this:

(function(asyncTasksArr, index, zipResults, jobVersion){
    return function(){
        asyncTasksArr.push(Scraper.batchOne(zipResults['zip_codes'][index], jobVersion));
    }
}(asyncTasks, i, zip_results, job_version));

Hope this helps.

3 Comments

But wouldn't this execute immediately on load? In this case, Scraper.batch is a callback to something else already.
The outer function which sets the scope will be called onload... The inner will be called only when 'parallelLimit' is called.
Ah dummy me, you are absolutely right. Now my issue is that when I call async.parallelLimit(tasks, number_of_runs), it stops as soon as I get number_of_runs iterations. Any clue why that one might be happening?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.