0

I have a bash script which processing each file in some directory:

for (( index=0; index<$COUNT; index++ ))
do
    srcFile=${INCOMING_FILES[$index]}
    ${SCRIPT_PATH}/control.pl ${srcFile} >> ${SCRIPT_PATH}/${LOG_FILE} &
    wait ${!}
    removeIncomingFile ${srcFile}
done

and for few files it works fine but when the number of files is quite large is too slow. I want to use this script parallel to processing grouped files.

Example files:

server_1_1 | server_2_1 | server_3_1
server_1_2 | server_2_2 | server_3_2
server_1_3 | server_2_3 | server_3_3

script should processing files related to each server parallel.
First instance - server_1*
Second instance - server_2*
Third instance - server_3*

Is it possible using GNU Parallel and how it can be reached? Many thanks for each solution!

3
  • Why run just one command in background and then wait? Makes more sense if doing several at once... Commented Oct 30, 2018 at 20:32
  • 2
    this response might help you. It is logic to spawn commands in background and wait for them. There's even a POC version of a spooling script. That page also has lots of useful info about parallel. Commented Oct 30, 2018 at 20:35
  • Nothing in your code relates to the server numbers you mention! What are the pipe symbols (|) trying to tell me? Commented Oct 30, 2018 at 21:10

2 Answers 2

1

I can't make head nor tail of what your question is trying to say, but I suspect the following will make a reasonable starting point. You put your actual code inside the '...' instead of the dummy actions I have used:

#!/bin/bash

# Do stuff for server 1
parallel -k 'echo server_1_{} ; date >> log_1_{}' ::: {1..3}

# Do stuff for server 2
parallel -k 'echo server_2_{} ; date >> log_2_{}' ::: {1..3}

# Do stuff for server 3
parallel -k 'echo server_3_{} ; date >> log_3_{}' ::: {1..3}

Sample Output

server_1_1
server_1_2
server_1_3
server_2_1
server_2_2
server_2_3
server_3_1
server_3_2
server_3_3

Log files created

-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_3
Sign up to request clarification or add additional context in comments.

Comments

1

The grouping part confuses me.

I have the feeling you want them grouped because you do not want to overload the server.

Normally you would simply do:

parallel "control.pl {}; removeIncomingFile {}" ::: incoming/files* > my.log

This will run one job per CPU thread.

Consider spending 20 minutes on reading chapter 1+2 of "GNU Parallel 2018" (printed, online). I think it will help you understand the basic uses of GNU Parallel.

1 Comment

Thanks for the answer. I have a lot of servers which should be monitored. Prepared files from each server are provided to one machine where script processing them in date order. This date is a part of file name fe. TYPE1_server01_20181030_194002.out. After file processing the data are inserting into database. Based on this I can prepare availibility report etc. I want to processing those files parallel, for each server in date order.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.