2

I need to run a lot of similar commands in a quickest possible amount time and using all available resources.

For example my case is processing images, when I'm using following command: for INPUT in *.jpg do; some_command; done the command is executed one by one and not using all the available resources.

But on the other side executing for INPUT in *.jpg do; some_command &; done makes the machine to run out of resources in a very short time.

I know about at's batch command, but I'm not sure if I can use that in my case. Correct me if I am wrong.

So I was thinking about putting the commands in some kind of queue and executing just a part of them at once. I don't know how to do that in a quick way and that's the problem. I'm sure someone ran in a similar problem before.

Please advise.

1

2 Answers 2

3

GNU Parallel is made for exactly this:

parallel some_command {} ::: *.jpg

It defaults to one job per CPU core. In your case you might want to run one more job than you have cores:

parallel -j+1 some_command {} ::: *.jpg

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

2

You can use GNU make with the --jobs option to run things in parallel but limited to the specified number of jobs. You can tailor that number to something that will not kill your machine.

Here's an example Makefile that uses targets a-h (these could be your output files e.g.) and runs a (dummy) set of commands for each target:

all: a b c d e f g h

a b c d e f g h:
    echo $@; sleep 10

N.B. The indentation of the command must be a TAB character. See the GNU make documentation for the details of the syntax of Makefiles.

You can invoke make with make --jobs 4 and get the following output (I used time make --jobs 4 below to show the elapsed time):

echo a; sleep 10
echo b; sleep 10
echo c; sleep 10
echo d; sleep 10
b
a
c
d
echo e; sleep 10
echo f; sleep 10
echo g; sleep 10
e
f
echo h; sleep 10
g
h

real    0m20.009s
user    0m0.010s
sys 0m0.011s

The first four were executed in parallel, then the next four, so the total elapsed timed is 20 seconds.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.