When invoked with python2 after importing os and sys, the following function runs successfully but acts incorrectly when invoked by Nagios nrpe:
def get_proc1_open_files():
# Set proc1_children list to empty, then run a system command to get a list of proc1 child processes
proc1_children = []
for pids in os.popen("pgrep -P `ps -ef | grep 'proc1 master' | grep -v grep | head -1 | awk '{print $2}'`").readlines():
proc1_children.append(pids.strip())
# Build an lsof command using the proc1_children list as the list of pids. Grep out the data files lines
proc1_lsof = "lsof -p " + ','.join(map(str,proc1_children)) + " | grep -P .*\/[0-9]+\.yaml"
#Finally, run the lsof and return the number of open files
proc1_open_files = len(os.popen(proc1_lsof).readlines())
return proc1_open_files
By placing many prints throughout the function and un-nesting some of the functions and re-running it, I determined that everything works properly when invoked by Nagios nrpe until this line:
proc1_open_files = len(os.popen(proc1_lsof).readlines())
Specifically, I found that os.popen(proc1_lsof).readlines() returns nothing for whatever reason.
Notes:
- I did make sure to define the script as a python 2 script
- Running on Debian Wheezy
- Nagios3 does successfully process output from the script. The resulting value simply isn't the correct value
- This script usually returns a value in the range of 5-25
- The output when ran by a user is usually something like "WARNING - 12 proc1 open files."
- The exact output when run through Nagios nrpe is "OK - 0 proc1 open files." every time.
Here is a link to the full script: nrpeplugin.py
I posted this in the UNIX stack exchange instead of Stack Overflow because I am primarily trying to find out why that tidbit of code would act differently when invoked through Nagios nrpe vs when it is invoked directly by a user. My apologies if this isn't the correct forum for this.