0

I have been having rsync try five times to connect. I have been getting exit codes of 11 indicating a file I/O error. When I try using robocopy I get an error 59 when I copy a set of files to the same target.

The thing is that robocopy succeeds where rsync fails, and when I looked in the robocopy logs I noticed right after it got the error that it did a retry 30 seconds later. I need to use rsync so I am wondering how to set up rsync to do a retry for say a network error 30 seconds later. This only happens say 5 times during a transfer of over 200K of files with varying sizes. When I look at the rsync logs immediately after code 11 it immediately requests the list of files from the source to get and it exits with that same exit code of 11 indicating to me it failed.

Command-line

rsync -rtlzv -e "ssh -i c:/RsyncKeys/wa-ecy-gov-test-rsync-key -o ConnectTimeout=140 -o ConnectionAttempts=18" --quiet --stats --exclude-from='rsyncfilter.txt' --force --delete [email protected]: //sdceco/Apps/RSTUDIO/RpackagesNew  --timeout=320 --log-file=c:/rsynclogs/rsync11-05-2020.log

Many thanks for any advice!

2
  • Would the I/O error be an open file.? Commented Nov 13, 2020 at 12:27
  • For those reading this question after the event, it turns out the underlying issue is a Windows network share failing the destination write Commented May 3, 2022 at 7:29

1 Answer 1

1

The way I do it is similar to this

while :
do
    rsync … && break
    sleep 60
done

The rsync tool exits with status 0 (success) when it completes successfully, which breaks out of the loop. Otherwise, there's a 60 second pause and the loop restarts. In my own implementation of this I tend to use a longer delay and also a counter to exit the loop after k tries:

ss=127 k=4
while [[ $((k--)) -gt 0 ]]
do
    rsync … && ss=0 && break
    ss=$?
    [[ $k -gt 0 ]] && sleep 300
done
echo "completion status of last rsync is $ss"

More recently I have encountered the retry command available in (at least) Debian. The loop code can thus be simplified considerably:

retry --delay 60 --time 4 rsync …
ss=$?
echo "completion status of last rsync is $ss"

Following your question's update, here is the rsync command I would use

rsync -rtlzq -e "ssh -i c:/RsyncKeys/wa-ecy-gov-test-rsync-key -o ConnectTimeout=140 -o ServerAliveInterval=15" --exclude-from='rsyncfilter.txt' --force --delete --log-file=c:/rsynclogs/rsync11-05-2020.log [email protected]: //sdceco/Apps/RSTUDIO/RpackagesNew

Changes

  • Added ServerAliveInterval=15 for the ssh transport to force a disconnect after 45 seconds of no server response whatsoever
  • No -v (--verbose) or --stats because you overrode them with -q (--quiet)
  • No --timeout because it's managed by ssh at the transport level
  • The copying instructions for R warn that -l might need to be replaced with -L on Windows filesystems. I've not made that change here but you (and future readers) should be aware of it
13
  • I am trying to understand. Are what you are suggesting a 30 second gap between runs of rsync? So if rsync exits with code of 11 then wait 30 seconds before re-running?) Thanks for the reply. Commented Nov 5, 2020 at 21:37
  • Thanks the problem is with 200k files being copied the error for some other file might occur and also to get to that point again will be much more than 30 seconds potentially. So it seems odds are against me if I always let rsync restart. I hope that makes sense. Again thanks for the replies. Commented Nov 5, 2020 at 22:22
  • Thanks roaima I updated my command line as you suggested and I will run it aa few times to see if consistant results occur. Will this have an effect that when a network error occurs it will repeat the file copy where the error occured and move on from there to finish the list? Thanks again Tony. Commented Nov 6, 2020 at 17:50
  • Yes, exactly that. If they don't we will need to investigate further as it's then not normal behaviour Commented Nov 6, 2020 at 17:56
  • Could also be that the exit code is wrong. It seems to match the code value of 11 and if it does retry and succeed seems like the exit code should be a 0. Anyway, if the problem repeats I will leave the log entry around that code value to get feedback. Again thanks! Commented Nov 6, 2020 at 18:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.