Yesterday a friend of mine asked on U&L: How to optimize 'grep'-and-save?
I will summarise shortly. The output of the torque pbs queueing system tool qstat should be parsed to contain only the Job ID, Queue, Jobname, and the node it is running on. This should be appended to an existing file, filtering out the duplicate lines (from an earlier run).
I personally am not a big fan of writing intermediate files, it might be preference, but it was what my admin taught me. I have written the following script using arrays instead of temporary files and I would like to know if this is the best way to do it, or if there is something faster.
# For who is this?
username=$USER
# Store the original file in an array
mapfile -t original_File < ~/.qstat_history
# Get the newest data from qstat and store it in an array
mapfile -t append_File < <(qstat -u $username -n -1 | \
                          sed -n -e '/^[[:digit:]]\+/s/\/.*//p' | \
                          awk '{print $1 "\t" $3 "\t" $4 "\t" $NF}')
# combine two arrays, but delete duplicate lines
mapfile -t combined_File < <(printf "%s\n" "${original_File[@]}" "${append_File[@]}" | sort -u)
# Overwrite original file
printf "%s\n" "${combined_File[@]}" > ~/.qstat_history
# Echo new file
cat ~/.qstat_history
# Exit graciously
exit 0
Some explanation:
- The first line is to ensure easy modification for the case where $USERdoes not match the username registered in the queueing system
- The script creates ~/.qstat_history, so it needs to read in the file into an array first.
- The second array contains the new data. 
- qstat -u $username -n -1gets the sample output below. It ensures only the current user is used,- -nrequests the nodes, and- -1gives raw output (suppresses a line break in the nodes field and to the other info).
- sed -n -e '/^[[:digit:]]\+/s/\/.*//p'matches any line starting with a digit, since only lines with Job Ids will match this pattern, it filters out anything unnecessary. The last field, the nodes, are separated with front slashes, even in the serial case. In this set-up it is considered that there is only one node per job, hence repetition can be deleted. It substitutes only lines that actually do have a node list, hence getting rid of all non-running jobs in the process. (In the example there are two queued- Qjobs, but there could be other status, too.)
- Since the format is fixed, awk '{print $1 "\t" $3 "\t" $4 "\t" $NF}'will fetch everything that we want.
 
- The combining part is actually the part that I am most worried about. It simply prints both previous arrays, sorts them and removed duplicate lines in the process.
- The resulting array will be overwritten on the initial file. (Should I be deleting the initial file first?)
Should I be checking the file ~/.qstat_history first?
Is there anything I di d not think about that could break the script?
Sample output from qstat -u my12name -n -1:
omega.cluster:
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
2974949.omega.cluste my12name  short    j-M0044_td  21582     1   8 12288m 500:0 R 144:0   node54/7+node54/6+node54/5+node54/4+node54/3+node54/2+node54/1+node54/0
2975035.omega.cluste my12name  short    j-M0044_td  21833     1   8 12288m 500:0 R 140:3   node52/7+node52/6+node52/5+node52/4+node52/3+node52/2+node52/1+node52/0
2976373.omega.cluste my12name  short    j-M0051_fr  17782     1   8 12288m 500:0 R 15:55   node51/7+node51/6+node51/5+node51/4+node51/3+node51/2+node51/1+node51/0
2976374.omega.cluste my12name  short    j-M0052_fr   1350     1   8 12288m 500:0 R 12:11   node42/7+node42/6+node42/5+node42/4+node42/3+node42/2+node42/1+node42/0
2976375.omega.cluste my12name  short    j-M0053_fr  32008     1   8 12288m 500:0 R 11:00   node43/7+node43/6+node43/5+node43/4+node43/3+node43/2+node43/1+node43/0
2976376.omega.cluste my12name  short    j-M0054_fr  15650     1   8 12288m 500:0 R 10:57   node41/7+node41/6+node41/5+node41/4+node41/3+node41/2+node41/1+node41/0
2976377.omega.cluste my12name  short    j-M0055_fr  31076     1   8 12288m 500:0 R 07:26   node56/7+node56/6+node56/5+node56/4+node56/3+node56/2+node56/1+node56/0
2976378.omega.cluste my12name  short    j-M0056_fr  17148     1   8 12288m 500:0 R 02:11   node45/7+node45/6+node45/5+node45/4+node45/3+node45/2+node45/1+node45/0
2976379.omega.cluste my12name  short    j-M0057_fr    --      1   8 12288m 500:0 Q   --     --
2976380.omega.cluste my12name  short    j-M0058_fr    --      1   8 12288m 500:0 Q   --     --
