Using parallel in bash script in order to processing grouped input files

Question

I have a bash script which processing each file in some directory:

for (( index=0; index<$COUNT; index++ ))
do
    srcFile=${INCOMING_FILES[$index]}
    ${SCRIPT_PATH}/control.pl ${srcFile} >> ${SCRIPT_PATH}/${LOG_FILE} &
    wait ${!}
    removeIncomingFile ${srcFile}
done

and for few files it works fine but when the number of files is quite large is too slow. I want to use this script parallel to processing grouped files.

Example files:

script should processing files related to each server parallel.
First instance - server_1*
Second instance - server_2*
Third instance - server_3*

Is it possible using GNU Parallel and how it can be reached? Many thanks for each solution!

Why run just one command in background and then wait? Makes more sense if doing several at once... — Paul Hodges
– Paul Hodges, Commented Oct 30, 2018 at 20:32
this response might help you. It is logic to spawn commands in background and wait for them. There's even a POC version of a spooling script. That page also has lots of useful info about parallel. — Paul Hodges
– Paul Hodges, Commented Oct 30, 2018 at 20:35
Nothing in your code relates to the server numbers you mention! What are the pipe symbols (|) trying to tell me? — Mark Setchell
– Mark Setchell, Commented Oct 30, 2018 at 21:10

Mark Setchell · Accepted Answer · 2018-10-30 21:08:30Z

I can't make head nor tail of what your question is trying to say, but I suspect the following will make a reasonable starting point. You put your actual code inside the '...' instead of the dummy actions I have used:

#!/bin/bash

# Do stuff for server 1
parallel -k 'echo server_1_{} ; date >> log_1_{}' ::: {1..3}

# Do stuff for server 2
parallel -k 'echo server_2_{} ; date >> log_2_{}' ::: {1..3}

# Do stuff for server 3
parallel -k 'echo server_3_{} ; date >> log_3_{}' ::: {1..3}

Sample Output

server_1_1
server_1_2
server_1_3
server_2_1
server_2_2
server_2_3
server_3_1
server_3_2
server_3_3

Log files created

-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_3

Ole Tange · Accepted Answer · 2018-10-31 09:38:01Z

1

The grouping part confuses me.

I have the feeling you want them grouped because you do not want to overload the server.

Normally you would simply do:

parallel "control.pl {}; removeIncomingFile {}" ::: incoming/files* > my.log

This will run one job per CPU thread.

Consider spending 20 minutes on reading chapter 1+2 of "GNU Parallel 2018" (printed, online). I think it will help you understand the basic uses of GNU Parallel.

edited Oct 31, 2018 at 9:38

answered Oct 31, 2018 at 8:08

Ole Tange

34.1k9 gold badges93 silver badges111 bronze badges

1 Comment

Peter F Over a year ago

Thanks for the answer. I have a lot of servers which should be monitored. Prepared files from each server are provided to one machine where script processing them in date order. This date is a part of file name fe. TYPE1_server01_20181030_194002.out. After file processing the data are inserting into database. Based on this I can prepare availibility report etc. I want to processing those files parallel, for each server in date order.

Collectives™ on Stack Overflow

Using parallel in bash script in order to processing grouped input files

2 Answers 2

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Linked

Related