In most cases using a parallel flow such as this, you wont be printing a bunch of numbers in a for-loop (which so happens to be blocking execution). When you register your functions, they are registered in the same order in which you defined them in that array your passing to parallel. In the case above, function a first and function b second. Consequently, Node's event loop will call upon a() first then b() at an undisclosed time later. Because we know those for-loops are blocking, and node runs in a single thread, it must complete the entire for-loop within a() and finally return before Node's event loop gets to take control over it again, where b() is waiting in the queue to be processes similarly.
Why is a parallel flow-control construct useful? By design, you're not suppose to do blocking operations within node (see your example). a() consumes the entire thread, then b() will consume the entire thread before anything else gets to happen.
a() b()
|
|
|
|
RET
|
|
|
|
RET
Now, say you are making a web application where a user may register and, at the same time, upload a picture. Your user registration might have code like this:
var newUser = {
username: 'bob',
password: '...',
email: '[email protected]',
picture: '20140806-210743.jpg'
}
var file = path.join(img.IMG_STORE_DIR, newUser.picture);
flow.parallel([
function processImage(callback) {
img.process(function (err) {
if (err) return callback(err);
img.save(file, function (err) {
return callback(err); // err should be falsey if everything was good
})
});
},
function dbInsert(callback) {
db.doQuery('insert', newUser, function (err, id) {
return callback(err);
});
}
], function () {
// send the results to the user now to let them know they are all registered!
});
The inner functions here are non-blocking, and both call upon processing or network laden operations. They, however, are fairly independent of each other. You don't need one to finish for the other to begin. Within the functions we can't see the code for, they are using more async calls with function callbacks, each one enqueing another item for Node to process. Node will attempt to clear out the queue, evenly distributing the workload among CPU cycles.
We hope that something like this is now happening:
a = processImage
b = dbInsert
a() b()
|
|
|
|
|
|
|
RET |
RET
If we had them in series, i.e. you must wait for the image to fully be processed prior to the db insert, you have to do a lot of waiting. If IO is really high on your system, node will be twiddling its thumbs waiting on the OS. By contrast, using parallel will allow slow operations to yield to faster ones, theoretically.
If Node does this by itself, why do we really need it? The key is in the 2nd argument that you've omitted.
nimble.parallel([a,b], function () {
// both functions have now returned and called-back.
});
You can now see when both tasks are done, node does not do this by default, so it can be a fairly useful thing to have.