So, you've got a shiny new shell script courtesy of ChatGPT, Copilot, or your favorite AI. It looks good, it even feels good. But that nagging doubt creeps in: "Is this thing really safe to run in production?"
This is the world of unit testing shell scripts generated by LLMs – a world where the stakes are high, sudo
is a double-edged sword, and a single misplaced rm -rf
can ruin your entire day. This post provides a battle-tested strategy to safely test and validate scripts that manage critical services like PM2, Docker, Nginx, or anything interacting with your system's state.
The Perils of Trusting LLM-Generated Shell Scripts
Large Language Models (LLMs) are fantastic for quickly generating shell scripts. However, even the best LLMs are prone to:
- Making assumptions about your environment: They might assume specific package installations or directory structures that don't exist on your server.
-
Using incorrect binary names: For example, using
pgrep -x PM2
instead of the correctpm2
. -
Overlooking side effects: Commands like
systemctl restart docker
aren't always harmless; they can cause unexpected downtime.
Even if the script's logic is 90% correct, that remaining 10% can lead to:
- Services restarting at the wrong time.
- Data written to incorrect log paths.
- Broken idempotency (repeated runs causing unintended changes).
That's why robust unit testing is crucial – not in the traditional pytest
sense, but using shell-native methods to verify logic and safety.
Strategy 1: Embrace the --dry-run
Mode
Every LLM-generated script should include a --dry-run
flag. This allows you to preview the script's actions without executing them.
Here's how to implement it:
DRY_RUN=false
[[ "$1" == "--dry-run" ]] && DRY_RUN=true
log_action() {
echo "$(date): $1"
$DRY_RUN && echo "[DRY RUN] $1" || eval "$1"
}
# Example usage:
log_action "sudo systemctl restart nginx"
This approach provides traceable and reversible operations, letting you inspect the intended actions before execution.
Strategy 2: Mock External Commands
You don't want docker restart
or pm2 resurrect
running during your tests. We can override these commands using mocking:
- Create a
mock-bin
directory:mkdir mock-bin
- Create a mock
docker
script:
echo -e '#!/bin/bash\necho "[MOCK] $0 $@"' > mock-bin/docker
chmod +x mock-bin/docker
- Add the mock directory to your
PATH
:export PATH="$(pwd)/mock-bin:$PATH"
Now, any call to docker
will output a harmless message instead of interacting with your containers. Repeat this process for other potentially disruptive commands like systemctl
, pm2
, and rm
.
This technique, borrowed from the excellent Bash Automated Testing System (BATS), allows for isolated and safe testing.
Strategy 3: Leverage shellcheck
LLMs sometimes make mistakes with quoting, variables, or command usage. shellcheck
is your invaluable ally here.
Simply run:
shellcheck myscript.sh
shellcheck
will identify:
- Unquoted variables (
"$var"
vs$var
). - Incorrect command usage.
- Malformed
if
conditions.
Think of it as a linter for your shell scripts, ensuring their structural integrity.
Strategy 4: Modularize with Functions
Break your script into smaller, testable functions:
check_pm2() {
ps aux | grep '[P]M2' > /dev/null
}
restart_all() {
pm2 resurrect
docker restart my-app
systemctl restart nginx
}
This allows you to mock and call these functions individually within a test harness, avoiding the need to run the entire script each time.
Strategy 5: Log Everything (Seriously!)
Log every decision point. Why? Because "works on my machine" is unhelpful when a container fails to restart or PM2 silently exits.
log() {
echo "$(date '+%F %T') [LOG] $1" >> /var/log/pm2_watchdog.log
}
Comprehensive logging provides crucial debugging information when things go wrong.
Strategy 6: Sandbox Your Tests
If you have access to Docker or a virtual machine, create a replica environment to run your tests. It's far better to break a test server than your production system!
For example:
docker run -it ubuntu:20.04
# Then install necessary packages: pm2, docker, nginx, etc.
Bonus: Useful Tools
- BATS: A powerful Bash unit testing framework.
- shunit2: An xUnit-style testing framework for POSIX shells.
- assert.sh: A simple shell assertion helper.
- shellspec: A full-featured, RSpec-like testing framework.
Final Thoughts: Test Before You Trust
It's tempting to simply run an LLM-generated script, but in production environments, especially those managing critical services, testing is paramount. Use dry-run flags, mock commands, employ shellcheck
, add comprehensive logging, and test in a sandbox. Prioritize safety – your sanity and uptime will thank you!
💬 Your thoughts?
Did this help you? Have questions? Drop a comment below!
🔗 Read more
Full article on our blog with additional examples and resources.
Top comments (0)