node.js parallel execution

Question

I am trying to learn parallel execution in node.js. I wrote below sample code. However, the output is serial. First 0..99 gets printed and then 100..200.

I understand this is because node.js is inherently single threaded and inside the loop, the thread is captured by the for loop.

What I am trying to understand is in what cases this flow.parallel structure is useful? Any request to I/O or database will anyways will be asynchronous in node.js. Then why do we need flow.parallel ?

var flow = require('nimble');


flow.parallel([

    function a(callback)
    {
        for(var i=0;i<100;++i)
        {
            console.log(i);

        }
            callback();
    },
    function b(callback)
    {

        for (var i=100;i<200;++i)
        {
            console.log(i);

        }
        callback();
    }
    ]);

It's useful for executing something after all parallel tasks. Real problem with async is not to run something in parallel, but to run something after something else. — Alexey Ten
– Alexey Ten, Commented Aug 7, 2014 at 4:36
Side note that it looks like caolan hasn't updated nimble in 3 years. You might check out caolan/highland instead. — Peter Lyons
– Peter Lyons, Commented Aug 7, 2014 at 18:29

SamT · Accepted Answer · 2014-08-07 04:45:03Z

In most cases using a parallel flow such as this, you wont be printing a bunch of numbers in a for-loop (which so happens to be blocking execution). When you register your functions, they are registered in the same order in which you defined them in that array your passing to parallel. In the case above, function a first and function b second. Consequently, Node's event loop will call upon a() first then b() at an undisclosed time later. Because we know those for-loops are blocking, and node runs in a single thread, it must complete the entire for-loop within a() and finally return before Node's event loop gets to take control over it again, where b() is waiting in the queue to be processes similarly.

Why is a parallel flow-control construct useful? By design, you're not suppose to do blocking operations within node (see your example). a() consumes the entire thread, then b() will consume the entire thread before anything else gets to happen.

a()  b()
 |
 |
 |
 |
RET
     |
     |
     |
     |
    RET

Now, say you are making a web application where a user may register and, at the same time, upload a picture. Your user registration might have code like this:

var newUser = {
  username: 'bob',
  password: '...', 
  email: '[email protected]',
  picture: '20140806-210743.jpg'
}

var file = path.join(img.IMG_STORE_DIR, newUser.picture);

flow.parallel([
  function processImage(callback) {
    img.process(function (err) {
      if (err) return callback(err); 

      img.save(file, function (err) {
        return callback(err); // err should be falsey if everything was good
      })
    });
  },
  function dbInsert(callback) {
    db.doQuery('insert', newUser, function (err, id) {
      return callback(err);
    });
  }
], function () {
  // send the results to the user now to let them know they are all registered! 
});

The inner functions here are non-blocking, and both call upon processing or network laden operations. They, however, are fairly independent of each other. You don't need one to finish for the other to begin. Within the functions we can't see the code for, they are using more async calls with function callbacks, each one enqueing another item for Node to process. Node will attempt to clear out the queue, evenly distributing the workload among CPU cycles.

We hope that something like this is now happening:

a = processImage
b = dbInsert
a()  b()
 |
      |
 |
      |
 |   
      |
 |
RET   |
     RET

If we had them in series, i.e. you must wait for the image to fully be processed prior to the db insert, you have to do a lot of waiting. If IO is really high on your system, node will be twiddling its thumbs waiting on the OS. By contrast, using parallel will allow slow operations to yield to faster ones, theoretically.

If Node does this by itself, why do we really need it? The key is in the 2nd argument that you've omitted.

nimble.parallel([a,b], function () {
  // both functions have now returned and called-back. 
});

You can now see when both tasks are done, node does not do this by default, so it can be a fairly useful thing to have.

Peter Lyons · Accepted Answer · 2014-08-07 18:28:01Z

2

The flow.parallel gives you reusable logic for determining when all the parallel operations have completed. Yes, if you just did db.query('one');db.query('two');db.query('three');, they would all execute in parallel by the nature of async, but you'd have to write some boilerplate code to keep track of when they were all done and if any had encountered an error. It's that part that flow.parallel (or it's counterpart in any flow control library) provides.

answered Aug 7, 2014 at 18:28

Peter Lyons

147k32 gold badges285 silver badges281 bronze badges

Comments

Mahadev Mirasdar · Accepted Answer · 2018-06-24 14:09:12Z

parallel execution in Node.js

Reading file directory in parellel execution Using Nodejs

create dir

mkdir Demo

create files

demo.txt,demo2.txt,demo3.txt

each file having some contains or paragraph

create file word_count.js

var fs = require('fs');
var completedTasks = 0;
var tasks = [];
var wordCounts = {};
var filesDir = './test';


function checkIfComplete() {
      completedTasks++;
    if(completedTasks == tasks.length){
          for (var index in wordCounts){
            console.log(index +': ' + wordCounts[index]);
         }
      }
 }


 function countWordsInText(text) {
     var words = text
         .toString()
         .toLowerCase()
         .split(/\W+/)
         .sort();
     for (var index in words) {
       var word = words[index];
        if(word) {
          wordCounts[word] = (wordCounts[word]) ? wordCounts[word] + 1 : 1;
       }
    }
 }


 fs.readdir(filesDir, function(err, files){
   if(err) throw err;
   for (var index in files) {
     var task =(function (file) {
       return function() {
         fs.readFile(file, function(err, text) {
           if(err) throw err;
            countsInText(text);
            checkIfComplete();
         });
       }
     })(filesDir + '/' + files[index]);
     tasks.push(task);

   }
   for (var task in tasks) {
     tasks[task] ();
   }
 });

Collectives™ on Stack Overflow

node.js parallel execution

3 Answers 3

Comments

Comments

parallel execution in Node.js

Reading file directory in parellel execution Using Nodejs

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

parallel execution in Node.js

Reading file directory in parellel execution Using Nodejs

Comments

Linked

Related