3

I'm writing a short bash script that involves logging into roughly 100 nodes and counting the number of processes running in top with my username, and I'm trying to combine the ssh-related processes all into one bundled command:

ssh server-1 top -bn1 | grep 'program' | grep -c 'username'

When I do this, I notice that it takes roughly 5 seconds to return the process count for a server, which is not long by itself. However, when I want to run this command about 100 times, it would be desirable to speed it up. I can edit my .bash_profile to comment out the following lines:

if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

From there, I can complete the process of entering the ssh, top/grep/grep, and exit commands by hand in about 3 seconds. Given that the ssh and exit processes only take about 1 second combined (with my .bash_profile commented out) and that the top/grep/grep is nearly instant, it should be significantly faster than 3 seconds if the computer does the commands. However, keeping the .bash_profile commented out does not improve the speed of the bundled ssh command and the bundled command still takes about 5 seconds vs. 3 seconds doing it by hand. Can anyone explain why the bundled command is slower than doing the commands individually by hand, even when the .bash_profile is commented out?

4
  • 3
    Study your bash documentation section about which startup scripts get sourced by default under which circumstances. Generally, profiles get sourced on login shells, and rc's get sourced on non-login shells. I'm guessing since you're invoking a specific command via ssh, it's a non-login shell and thus skipping the profile and sourcing the rc. Also, note that you're only running top on the remote machine; those greps are executing on the local machine. (It might not make much performance difference, but I thought I should point it out.) Commented Aug 14, 2023 at 22:16
  • 1
    "I can edit my .bash_profile" – On the server(s), right? not on the client. Please state this explicitly, so we don't have to assume. Commented Aug 15, 2023 at 5:02
  • Apologies, I realized my terminology was inaccurate. It would be more accurate to say the processes are jobs on a cluster consisting of a login node and a bunch of compute nodes, and I am logging into a head/login node where I can submit jobs and check on the statuses of these jobs being worked on inside the compute nodes. Thus, the .bash_profile is initially activated on the head/login node but can be edited before I ssh to a compute node, where the .bash_profile is read again. Commented Aug 15, 2023 at 16:08
  • 1
    Please edit the question and improve it instead of clarifying in a comment. Commented Aug 16, 2023 at 3:46

1 Answer 1

1

I can make an educated guess - I posit that when you are running a session manually you already have another session logged in, and SSH is making use of the existing connection in the background. When you are scripting this, SSH needs to negotiate a whole new session.

There are lots of things you can do to speed up SSH connections - have a read of https://www.tecmint.com/speed-up-ssh-connections-in-linux/ - and especially section 3 which allows you to reuse SSH connections with the following in your ~/.ssh/config file -

Host *
ControlMaster auto
ControlPath  ~/.ssh/sockets/%r@%h-%p
ControlPersist 600

Alternatively look at "autossh", which worked wonders for me. (From memory - and it was 10 years ago now - I used autossh -M on startup on the monitoring server)

That said, allow me to suggest that if you step back, there may be a better way of doing this - making use of backgrounded processes and performing multiple SSH sections in parellel. Focussing on this will return a result a lot quicker (albeit even better if combined with the above techniques).

You should be able to background your SSH requests and dump the results to a file/multiple files, then wait for them all to complete, sort and present.

Here is a trivial piece of code to show the concept (Of-course this assumes a trust relationship between the hosts)

  #!/bin/bash

  TEMPFILE=/dev/shm/temp.$$

  for each in host1.name host2.name host3.name host4.name
  do
        ssh $each "ssh command(s) here" >> $TEMPFILE  &
  done
  wait
 
  cat $TEMPFILE
  rm $TEMPFILE

The magic bits here are -

  • The & at the end of the SSH line backgrounds the request.
  • The wait command waits for all the backgrounded processes to exit.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.