7

I had thought that asynchronous processing such as reading file is processed on other thread and notify to main thread when reading is finished in other thread.

I tried following.

const fs = require("fs")

console.time(1)
fs.readFile("largefile", x => console.timeEnd(1))

This shows 1500ms.

Secondly I tried following.

const fs = require("fs")

console.time(1)
fs.readFile("largefile", x => console.timeEnd(1))

// block main thread 1sec
const s = Date.now()
while(Date.now() - s < 1000);

It will show 1500ms if asynchronus process is processed on other thread. However, I got 2500ms.

I tried another one.

const fs = require("fs")

console.time(1)
fs.readFile("largefile", x => console.timeEnd(1))

setInterval(() => {
    const s = Date.now()
    while(Date.now() - s < 100);
}, 100)

I wait several minute, but there is no message.

Does nodejs process heavy processing on main thread?

Should I use child_process when I need reading and writing too many large files?

1 Answer 1

9

I/O is done using non-blocking operations under the covers (rather than occupying the main thread); I/O completions (e.g., callbacks), however, are done on the thread where the I/O operation was started (in your example, the one main JavaScript thread, because you're not using workers). If you saturate that thread, it won't have a chance to process the callbacks.

The main issues in your example are

  1. You're using a convenience function, readFile, to read a large file into memory all at once.

  2. Your test is synthetic, doing extremely CPU-intensive things that are unlikely to model your real application's characteristics.

Does nodejs process heavy processing on main thread?

Some parts of convenience functions like readFile are implemented on the thread you called it on (the main thread in your example), yes. readFile is implemented in JavaScript using fs.read, and it doesn't request the next chunk of data until after it's processed the previous chunk; the default size of the chunks (as of this writing) is 8k (8,192) bytes.

You can see this in the source in:

  • lib/fs.js, which shows readFile using a ReadFileContext object.
  • internal/fs/read_file_context.js, which shows the implementation of the ReadFileContext object, where we can see it reading in chunks via fs.read.

That means that if the main thread is blocked (your second code block) or under extremely heavy load (your third code block), it's very slow processing the once-per-8k read callbacks, and that dramatically impacts the performance of the convenience function:

  • Your second code block (with the 1,000ms busy-wait) prevents the handling of the first callback for the first 8k of data, so it holds up the read process for the full 1,000ms during which it's blocking.
  • Your third code block (with the setInterval call busy-waiting 100ms every 100ms) conspires with readFile's implementation to introduce a ~100ms delay between each 8k block read from the file. A file of any significant size is going to take a very long time to read at ~820 bytes/second.

Again, though, your test is synthetic. Blocking the main thread even for 100ms at a time is unusual.

Should I use child_process when I need reading and writing too many large files?

No. Just do it in reasonable-size chunks using the basic I/O operations (read, write, or streams) rather than using convenience functions like readFile. Using a child process for this would be at least as bad, if not worse, than using a worker thread, and the Node.js dev team have this to say in the worker threads documentation:

Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.

Sign up to request clarification or add additional context in comments.

4 Comments

To me the third example with the setInterval surprises me more then the second one without the setInterval. I would have assumed that setInterval would allow - even if the while blocks longer the 100ms - events to be processed in between. The second example on the other hand sounds more logical to me as it would queue up to start the reading of the file to the next tick.
@t.niese - setInterval on Node.js is a very strange beast (and doesn't work the same way it does on browsers). It's particularly strange when the work takes longer than the interval. Still, though...
@t.niese - Figured out why the setInterval example seems to lock things up.
Thank you for detailed explanation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.