0

I have SSH remote access to a machine I'd like to use for long-running jobs. What I currently do is simply

ssh user@remote command-to-run

This has several drawbacks:

  • I can't simply suspend my local machine - when I do that, SIGHUP will be sent to the remote process, effectively killing it. I could use nohup to prevent that.
  • The output may be long, I'd rather have it redirected to files. Of course, I can do it manually, but it gets clumsy with a series of commands.
  • The process may run a really long time. It would be ideal that the submitting program only confirms that the command (script) has been successfully submitted and terminates.
  • I'd like to get a mail notification, when the process terminates, with its exitcode. Of course, I could use a shell script and a terminal command to send it manually, one more hack.
  • I want to be able to schedule multiple scripts at once safely. In particular, I want to be able to push multiple scripts with the same name without manual renaming. I don't want to worry about possible files which already exist on the file system.

This is very similar to what SLURM does, but I don't have any administrative rights on the remote side. Besides, when I have the access to all cores of the remote machine, it makes no sense to declare, how many cores I need.

Is there anything I could use for this? What I described seems like a common usecase.

6
  • 2
    A common solution for this is using tmux Commented Jan 20, 2017 at 17:38
  • 1
    You could also look at nohup. This type of question has been asked over and over and over again. Commented Jan 20, 2017 at 17:40
  • 2
    start using ansible :) Commented Jan 20, 2017 at 17:47
  • Edited the OP to mention nohup I've already known about (just forgot to mention that) and to mention the extra requirements I realized I want. Commented Jan 20, 2017 at 21:36
  • Ansible seems to be the solution, after a quick look! Thanks, @JacobEvans! Is there any way I could detach from ansible immediately? A brief search yielded nothing Commented Jan 20, 2017 at 21:43

1 Answer 1

0

If you can put scripts that run these long-running jobs for you on the remote machine, this becomes very easy:

#!/bin/bash
# This script will run a long-running-job (if it's not already running)
# and email when it completes.
lockfile=/var/run/long-job-1.lock
logfile=$(mktemp)
errfile=$(mktemp)
if [[ -f "$lockfile" ]]; then
    echo "This job is already running." 1>&2
    exit 1
else
    echo $$ > "$lockfile"
    trap 'rm -f "$lockfile" "$logfile" "$errfile"' EXIT
fi

/path/to/some/really/longrunning/job.sh
returncode=$?

if [[ 0 -ne "$returncode" ]]; then
    cat "$errfile" | mailx -s "Job failed with exit code $returncode" -a "$logfile" [email protected]
else
    cat "$logfile" | mailx -s "Job succeeded" [email protected]
fi

Put that script on the remote server in your home directory as longjob1.sh. Then, locally, you can:

ssh username@remotehost "screen -dmS LongJob1 ./longjob1.sh"

The script (and the job it invokes) will run in a screen session on the remote server, and email you when it is done. If it exits in error, you will be emailed the error log, with the standard log attached to the email.

4
  • I am curious as to why this was downvoted, as it provides a solution to the question posed. A comment explaining why this was perceived to be a bad answer (particularly in the absence of any others) would be gratefully received. Commented Jan 20, 2017 at 18:17
  • Sorry. is because the question is not nice, the guy just dont like to use google or something like this. But the answer is nice. I removed the -1 as you ask. Commented Jan 20, 2017 at 18:59
  • Edited the OP. Two problems with this solution are mentioned there - I can't safely run schedule multiple commands at once. What if LongJob1 already exists? Commented Jan 20, 2017 at 21:38
  • That's what the start of the script is for - it drops a looks for a .lock file when it starts. If it's there, it aborts and barks; otherwise it writes the file and sets a trap to remove is when the script exits. You will very briefly have two screen sessions with the name LongJob1 in that instance. Commented Jan 20, 2017 at 21:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.