I would like to know how to determine the status of my server using the top command or if I have to change the server or add more resources. Next is the output the top command in my server.
Next are some other facts - The load average values vary from 28.XX to 77.XX. - CPUs %id most of the time is between 10.0 and 22.0 and sometimes drops to 30.0. - The server is running on a virtual machine. - The server the virtual machine is mounted on has a Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz, with 4 cores - The web applications, data base service, memcached, web server and other related server apps have been running for a week. - The presence.py service is the one that does the most amount of work and is currently checking the presence of 703 nodes.
I would say I do not need to add more RAM to the system but it certainly looks like the CPU is overwhelmed. We still have to add 100-200 more nodes so I think that the server is not going to be able to handle it. Am I right?
EDIT: presence.py background
The Presence service (presence.py) is a propietary application that runs in (twice the number of cores) processes, one main process and (twice the number of cores-1) worker processes. For each node registered a thread is created in one of the worker processes, so if we have 700 nodes each worker process will have ~100 threads running. Each thread checks the status of its node via telnet or HTTP once per second, so you can imagine the load of each process. This means most of the time each thread is sleeping or waiting for network i/o.
The Presence service started with a load of 350 nodes and had been working great but for some time, since we started increasing the load and surpassed the 600 nodes, it started to operate somewhat slow. For instance, if you run the following command curl www.google.com it takes many seconds to run it.
