1

We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.

We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.

Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.

So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.

The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.

Any thoughts now possible problems and how to avoid them would be appreciated.

1 Answer 1

1

From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).

The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.

In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.

One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. The loop is checking an RDS instance for new records, selecting one, and then processing it. So, while there is no work, it's checking that every second, (using settimeout to wait, then starting itself again) when it's working, it's not doing that. Doing more monitoring, when that python process starts, it eats it's CPU, but it's maya, it's rendering complex graphics, not sure how to do anything about that. If we restrict it's resources it's just going to take longer.
Maya is not multithreaded so you should be able to run one nodejs process per core and when all jobs are processing all the cores should be used.
A better set up in this situation though would be to run 1 nodejs process and set it up in such a way that it can spawn 1 python process per core. Since nodejs isn't using any significant CPU, your not going to have issues that require multiple instances per cpu. A single node process could easily spawn and keep track of a few hundred processes (and your likely not going to launch more than 4-8).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.