I have 2 different scripts: A1.sh and A2.sh. They are used to start middleware services, for different application. i.e. A1.sh will start one app service and A2.sh will start other app services. They are running on the same host (AIX).
As the services take some time (around 7 to 15min) to start, I have the below function in both of the scripts. It checks the log and waits until services starts, or times-out after 1000 seconds, if the services don't start in that time period. The scripts work fine if they are run sequentially. However if I run A1.sh script in one session and open another session (same host) and run A2.sh script, then one of the scripts fails with timeout (though the service starts and is running in background). This timeout is incorrect i.e. it is not 1000 seconds that has passed. Below is the code
### wait_for_log
### This wait for a goal message on a specified log, if this is't found the message
### for a timeout period trigger a error message on script log.
###
### usage: wait_for_log [ log_name ] [ start | stop ] [ app_name ] [ timeout ] [ goal_message ]
wait_for_log() {
FILE_NAME=$1
ACTION=$2
APP_NAME=$3
GOAL_MESSAGE=$5
GOAL_MESSAGE2=$6
TIMEOUT=$4
ELAPSED_TIME=0
START_TIME=$SECONDS
alert "info" "${ACTION^^} ${APP_NAME^^}"
alert "info" "Waiting for ${APP_NAME} ${ACTION}." -n
tail -0lf $FILE_NAME | while read -t $TIMEOUT LOGLINE
do
echo -n "."
if [ ! -z "$GOAL_MESSAGE2" ]; then
if [[ "${LOGLINE}" == *$GOAL_MESSAGE2* ]]; then
ps -ef | grep "[t]ail " | awk {'print $2'} | xargs kill
return 2
fi
fi
if [[ "${LOGLINE}" == *$GOAL_MESSAGE* ]]; then
ps -ef | grep "[t]ail " | awk {'print $2'} | xargs kill
return 2
fi
done
EXIT_CODE=$?
ELAPSED_TIME=$(($SECONDS - $START_TIME))
if [ $EXIT_CODE -eq 2 ];then
printf "\e[1;32m[OK]\e[0m\n"
alert "success" "${APP_NAME} took ${ELAPSED_TIME}s to ${ACTION}."
GLOBAL_ELAPSED_TIME=$((GLOBAL_ELAPSED_TIME + ELAPSED_TIME))
RETVAL=0
return 0
fi
printf "\e[1;31m[FAIL]\e[0m\n"
alert "error" "${APP_NAME} ${ACTION} failure, exceed the ${ELAPSED_TIME}s timeout to ${ACTION}."
RETVAL=1
exit_script $ACTION
}
FILE_NAME is different for 2 scripts. One of the script fails as shown below.
<Info> START RPM
Inside wait for log proc, recieved r2TIMEOUT value: 1000
<Info> STARTING NODEMANAGER
<Info> Waiting for NodeManager starting..[FAIL]
<Error> NodeManager starting failure, exceed the 6s timeout to starting.
<Error> Ocurred an ERROR when RPM trying to starting.
Any idea what is wrong with while loop when running concurrently?
exceed the ${TIMEOUT}sinstead ofexceed the ${ELAPSED_TIME}s.. In any case it is worth checking that TIMOUT is actually set to the value you think it is.return 2means that the function is terminated at that point.$SECONDSis a number between 0…59, because it's the seconds of the current time. SoSTART_TIMEalso has the same range, and yourELAPSED_TIMEwill normally never be something related to the$TIMEOUTyou set.! -zshould be-nto check if string is nonzero