1

I have access to a distributed computing/server farm with a job scheduler (Slurm) that gives each parallel job an integer ID from 1 to n (I know the value of n, in the example below, n = 10).

I am using find -maxdepth 1 -name '2019 - *' to find the list of file names I want to pass to my program as an argument.

Sample file names:

2019 - Alphabet
2019 - Foo Bar
2019 - Reddit
2019 - StackExchange

The order does not matter. All matching files should only be used once.

This is an example of a "template" script I can use:

#!/bin/bash

# in this case, from i = 1 to i = 10
#SBATCH --array=1-10

# pseudocode begins
    # it is given that filename_array has 10 unique elements
    filename_array="$(find -maxdepth 1 -name '2019 - *')"

    # SLURM_ARRAY_TASK_ID is the value of i, from i = 1 to i = 10
    filename=filename_array[$SLURM_ARRAY_TASK_ID]
# pseudocode ends

./a.out "$filename"

This is more or less what it does (but with each process running in a different computer in parallel):

./a.out "./2019 - Alphabet" &
./a.out "./2019 - Foo Bar" &
./a.out "./2019 - Reddit" &
./a.out "./2019 - StackExchange" &

How can I write a bash script that would run the template script exactly once for each of the file names given by find -maxdepth 1 -name '2019 - *'?

2 Answers 2

1

Probably using find is a mistake, particularly as you are only interested in files in the current directory. You can just use a shell glob pattern.

#/bin/sh

for f in '2019 - '*
do
    [ -f "$f" ] && ./a.out "$f" &
done

The test for it being a file is for portability. If you are using bash you could use shopt -s nullglob to make a non-matching pattern expand to nothing rather than itself, and so make the loop run zero times rather than one if there are no matching files. However portability is good, and handles cases like directory names which match the pattern.

Apparently what is required is a "template script", but I have limited idea what this means.

Perhaps

#!/bin/bash
# magic string for slurm to run on 10 hosts
#SBATCH --array=1-10

filename_array=( '2019 - '* )
filename=${filename_array[$SLURM_ARRAY_TASK_ID-1]}
./a.out "$filename"

is what is wanted?

Edit: Another requirement change. Support regular expressions for the patterns.

#!/bin/bash
# magic string for slurm to run on 10 hosts
#SBATCH --array=1-10

readarray -d '' filename_array < <( find . -maxdepth 1 -regex '.*2019 -.*' -print0 | sort -z )
filename=${filename_array[$SLURM_ARRAY_TASK_ID-1]}
./a.out "$filename"
6
  • I was not clear. I need to use a template script so that each of the process are run on a different computer. Commented Jul 23, 2020 at 3:04
  • OK, is the issue that you have n (10 in this case) cpus and an unknown number of files, which may be more or less than n? I am sorry but I am struggling to understand what the problem is. Commented Jul 23, 2020 at 4:01
  • I have (for example) exactly 10 files that match the pattern (2019 - *) and a large number of CPUs on physically different computers (for example, 1000 computers with 2 CPUs each). I want to spread the processes across the computers. I need to do this with Slurm Workload Manager. Commented Jul 23, 2020 at 4:19
  • Did my updated answer provide you with a solution? If this script is run on 10 different hosts with access to the same filesystem but each host has a different SLURM_ARRAY_TASK (in the range 1 to 10) then it provides a different filename to your program. Each file will be used once. Commented Jul 23, 2020 at 6:30
  • This works for me, but what if I have more complicated pattern than 2019 - * that can't be made in glob expansion? (or I don't know how convert find regex pattern to glob) Commented Jul 23, 2020 at 15:50
0

Can you use $SLURM_JOB_NODELIST?

In that case GNU Parallel seems like an obvious solution:

find -maxdepth 1 -name '2019 - *' |
  parallel --slf $SLURM_JOB_NODELIST --wd . ./a.out {}

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.