When GNU grep tries to write its result, it will fail with a non-zero exit status, because it has nowhere to write the output, because the SSH connection is gone.
This means that the if statement is always taking the else branch.
To illustrate this (this is not exactly what's happening in your case, but it shows what happens if GNU grep is unable to write its output):
$ echo 'hello' | grep hello >&- 2>&-
$ echo $?
2
Here we grep for the string that echo produces, but we close both output streams for grep so that it can't write anywhere. As you can see, the exit status of GNU grep is 2 rather than 0.
This is particular to GNU grep, grep on BSD systems won't behave the same:
$ echo 'hello' | grep hello >&- 2>&- # using BSD grep here
$ echo $?
0
To remedy this, make sure that the script does not generate output. You can do this with exec >/dev/null 2>&1. Also, we should be using grep with its -q option since we're not at all interested in seeing the output from it (this would generally also speed up the grep as it does not need to parse the whole file, but in this case it make very little difference in speed since the file is so small).
In short:
#!/bin/sh
# redirect all output not redirected elsewhere to /dev/null by default:
exec >/dev/null 2>&1
while true; do
date >sdown.txt
ping -c 1 -W 1 myserver.net >pingop.txt
if ! grep -q "64 bytes" pingop.txt; then
mutt -s "Server Down!" [email protected] <sdown.txt
break
fi
sleep 10
done
You may also use a test on ping directly, removing the need for one of the intermediate files (and also getting rid of the other intermediate file that really only ever contains a datestamp):
#!/bin/sh
exec >/dev/null 2>&1
while true; do
if ! ping -q -c 1 -W 1 myserver.net; then
date | mutt -s "Server Down!" [email protected]
break
fi
sleep 10
done
In both variations of the script above, I choose to exit the loop upon failure to reach the host, just to minimise the number of emails sent. You could instead replace the break with e.g. sleep 10m or something if you expect the server to eventually come up again.
I've also slightly tweaked the options used with ping as -i 1 does not make much sense with -c 1.
Shorter (unless you want it to continue sending emails when the host is unreachable):
#!/bin/sh
exec >/dev/null 2>&1
while ping -q -c 1 -W 1 myserver.net; do
sleep 10
done
date | mutt -s "Server Down!" [email protected]
As a cron job running every minute (would continue sending emails every minute if the server continues to be down):
* * * * * ping -q -c 1 -W 1 >/dev/null 2>&1 || ( date | mail -s "Server down" [email protected] )
:do? It would make sense to me it it were a semicolon;...:does nothing. This is what it is designed to do. Here, instead of inverting the test, they use it to do a no-op beforeelse.