Skip to main content
added 538 characters in body
Source Link

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) (so do nothing) ;
  ### if 200 wget $URL$i$j and ((j++)), and i=i-1 so I can have the same $i at the next loop, for a incremented $j
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks


Edit

#ps -ax | grep "curl" | grep -v "grep" | awk '{print $1}'| xargs kill -9 I figured out I can use that command to kill all curl request, that I can use in a if 200 condition, and then re-setting $i value with i=i-1, increment $j, and go on in the loop.

But at this stage, nothing is parallelized : I can find out how, with xargs, I can parralel curl request, but I can't do it to increment its value.

I think of a temporary file with URL generated in it, but I'd rather it to be generated as the script goes.

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) (so do nothing) ;
  ### if 200 wget $URL$i$j and ((j++)), and i=i-1 so I can have the same $i at the next loop, for a incremented $j
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) (so do nothing) ;
  ### if 200 wget $URL$i$j and ((j++)), and i=i-1 so I can have the same $i at the next loop, for a incremented $j
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks


Edit

#ps -ax | grep "curl" | grep -v "grep" | awk '{print $1}'| xargs kill -9 I figured out I can use that command to kill all curl request, that I can use in a if 200 condition, and then re-setting $i value with i=i-1, increment $j, and go on in the loop.

But at this stage, nothing is parallelized : I can find out how, with xargs, I can parralel curl request, but I can't do it to increment its value.

I think of a temporary file with URL generated in it, but I'd rather it to be generated as the script goes.

edits
Source Link

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) (so do nothing) ;
  ### if 200 wget $URL$i$j and ((j++)), iand i=i-1 so I can have the same $i at the next loop, for a incremented $j
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) ;
  ### if 200 wget $URL$i$j and ((j++)), i-1
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) (so do nothing) ;
  ### if 200 wget $URL$i$j and ((j++)), and i=i-1 so I can have the same $i at the next loop, for a incremented $j
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

added 216 characters in body
Source Link

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/files27file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg

I could curl -I ${URL}file${i}-${j}.jpg in a for i in {1..3000} and look for HTTPS 200 OK, if found ((j++)) and wget the files, in the script.

URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) ;
  ### if 200 wget $URL$i$j and ((j++)), i-1
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/files27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg

I could curl -I ${URL}file${i}-${j}.jpg in a for i in {1..3000} and look for HTTPS 200 OK, if found ((j++)) and wget the files, in the script.

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at the part where a 200 would kill all curl processus, and then resume to the 200 $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

I'd like to do this script in bash, so I can understand it and write it in python (that I know much less) later.

I have some files in data. 3, nothing more.

https://www.example.com/data/file27-1.jpg
https://www.example.com/data/file51-2.jpg
https://www.example.com/data/file2576-3.jpg
URL='https://www.example.com/data/file'
i=1 ; j=1
for i in {1..3000} ; do
  curl -I ${URL}file${i}-{j}.jpg
  # here, pipe result into grep,
  ### if !200 ((i++)) ;
  ### if 200 wget $URL$i$j and ((j++)), i-1
  " and go on into the the for loop
done

But curling for 3000 links individually takes some times. I'd like to parralelize the curl -I URL in some way, and when I get a 200 response, stop all process requesting as there won't be two files with the same $j value, add 1 to $j, and take everything back to appropriate values $i and $j and go on.

I'm stuck at parallelizing (but found many threads on it), but the part that really blocks me is where a 200 would kill all curl processus, and then resume to the 200 OK $i and $j value.

I hope I've been understandable. I didn't wrote a sample script yet, I'm looking into methods of achieving it.

Thanks

Source Link
Loading