We have a forever running shell script handled by systemd which runs fine until some days ago, our server storage was 100% used because of some gigantic log files. I had to truncate those files in order to free some spaces.
Today, I got a report that the script is stopped running. But when I checked the status, it says :
● ImportantService.service - Important daemon
Loaded: loaded (/etc/systemd/system/ImportantService.service; enabled; vendor preset: disabled)
Active: active (exited) since Wed 2020-04-29 16:46:48 WIB; 5 days ago
Process: 48877 ExecStop=/usr/local/bin/importantScript stop --instance XYZ (code=exited, status=0/SUCCESS)
Process: 48889 ExecStart=/usr/local/bin/importantScript start --instance XYZ (code=exited, status=0/SUCCESS)
Main PID: 48889 (code=exited, status=0/SUCCESS)
Tasks: 0
Memory: 48.0K
CGroup: /system.slice/ImportantService.service
I noticed that it has 0 task, hence I restarted it manually. Now it runs normally. I suspected that the problem I mentioned earlier caused this.
The question is, how to make systemd respawn the processes if this kind of problem occur in the future?
Here's the .service file:
[Unit]
Description= Important daemon
[Service]
Type=oneshot
ExecStart=/usr/local/bin/importantScript start --instance XYZ
RemainAfterExit=true
ExecStop=/usr/local/bin/importantScript stop --instance XYZ
[Install]
WantedBy=multi-user.target
Update : Some explanation regarding my script
The main functionality of ImportantScript is to enter a forever loop to do something if some event occurs in a given (arg) directory. And I should be able to launch multiple instances for different directories. Before I could start the script, I have to add a set of specifications of an instance, like :
ImportantScript add --name XYZ --dir /path/to/dir ..etc..
Now, I'm able to start instance of XYZ.
Every time I start an instance, it will store the PID to a file contains all predefined instances (using previous add command). PID that isn't listed means that corresponding instance is in idle state.
To stop an instance, I just have to call ImportantScript stop --instance name. It will terminate the process and delete the corresponding entry in the file, clean up its mess.
I hope this is not a systemd House of Horror entry.
supervisordaemon with proper config.startandstopverbs, that "importantScript" is far from important and is more likely a Poor Man's Dæmon Supervisor and Bad Logger that is in fact getting in the way. For best results, you should tell people how your real dæmon is actually run, in the depths of that script. Only with that information can one construct a service unit that isn't a systemd House of Horror entry.