GNU Parallel: How to limit max running network jobs per host

Question

I'm using GNU Parallel for scanning a list of urls (from different hosts) for vulnerability, like this:

cat urls.txt | parallel --gnu -j 50 ./scan {}

The 'scan' program works for one url in one thread. And I need to hard-limit the number of simultaneous requests (jobs) to each host, for example to 5 connections. How can I achieve this?

Just to clarify, are you saying you want 5 jobs to each host or 5 urls to be processed at a time? — Bratchley
– Bratchley, Commented Apr 5, 2015 at 23:11
I want total 50 urls to be processed at a time and 5 or less jobs to each host. (to not overload that servers) — Karim Valiev
– Karim Valiev, Commented Apr 6, 2015 at 20:24

Ole Tange · Accepted Answer · 2015-04-08 06:53:49Z

Split urls into one file per host. Then run 'parallel -j5' on each file.

Or sort urls and insert a delimiter '\0' when a new host is met, then split on '\0' and remove '\0' while passing that as a block to a new instance of parallel:

sort urls.txt | 
  perl -pe '(not m://$last:) and print "\0";m://([^/]+): and $last=$1' |
  parallel -j10 --pipe --rrs -N1 --recend '\0' parallel -j5 ./scan

Edit:

I think this will work:

cat urls.txt | parallel -q -j50 sem --fg --id '{= m://([^/]+):; $_=$1 =}' -j5 ./scan {}

sem is part of GNU Parallel (it is a shorthand for parallel --semaphore). {= m://([^/]+):; $_=$1 =} grabs the hostname. -j5 tells sem to make a counting semaphore with 5 slots. --fg forces sem to not spawn the job in the background. By using the hostname as ID you will get a counting semaphore for each hostname.

-q is needed for parallel if some of your URLs contain special shell chars (such as &). They need to be protected from shell expansion because sem will also shell expand them.

Thank you for the idea. But in my input file there are lots of hosts having 4 or less links. And also in your solution, each of the 'parallel -j5' will run as long as the slowest scan of these 5. Maybe there is a way to make a total number of running jobs equal to 50? — Karim Valiev
– Karim Valiev, Commented Apr 6, 2015 at 20:18

Stack Exchange Network

GNU Parallel: How to limit max running network jobs per host

1 Answer 1

You must log in to answer this question.

Hot Network Questions

GNU Parallel: How to limit max running network jobs per host

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions