Skip to main content
added 8 characters in body
Source Link
Ed Morton
  • 35.9k
  • 6
  • 25
  • 60

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

otherwise using any awk in any shell on every UNIX box:

$ awk '{
    while ( match($0,/href="http[^"]+.pdf"/) ) {
        split(substr($0,RSTART,RLENGTH),f,/"/)
        print f[2]
        $0 = substr($0,RSTART+RLENGTH)
    }
}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Just pipe that output to xargs -n 1 curl -O, to download the PDFs (assuming no spaces in the URL).

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

otherwise using any awk in any shell on every UNIX box:

$ awk '{
    while ( match($0,/href="http[^"]+.pdf"/) ) {
        split(substr($0,RSTART,RLENGTH),f,/"/)
        print f[2]
        $0 = substr($0,RSTART+RLENGTH)
    }
}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Just pipe that output to xargs curl, to download the PDFs.

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

otherwise using any awk in any shell on every UNIX box:

$ awk '{
    while ( match($0,/href="http[^"]+.pdf"/) ) {
        split(substr($0,RSTART,RLENGTH),f,/"/)
        print f[2]
        $0 = substr($0,RSTART+RLENGTH)
    }
}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Just pipe that output to xargs -n 1 curl -O, to download the PDFs (assuming no spaces in the URL).

added 713 characters in body
Source Link
Ed Morton
  • 35.9k
  • 6
  • 25
  • 60

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

otherwise using any awk in any shell on every UNIX box:

$ awk '{
    while ( match($0,/href="http[^"]+.pdf"/) ) {
        split(substr($0,RSTART,RLENGTH),f,/"/)
        print f[2]
        $0 = substr($0,RSTART+RLENGTH)
    }
}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Just pipe that output to xargs curl, to download the PDFs.

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

otherwise using any awk in any shell on every UNIX box:

$ awk '{
    while ( match($0,/href="http[^"]+.pdf"/) ) {
        split(substr($0,RSTART,RLENGTH),f,/"/)
        print f[2]
        $0 = substr($0,RSTART+RLENGTH)
    }
}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf

Just pipe that output to xargs curl, to download the PDFs.

Source Link
Ed Morton
  • 35.9k
  • 6
  • 25
  • 60

Since you have GNU sed you can install GNU awk. With GNU awk for multi-char RS and RT:

$ awk -v RS='href="http[^"]+.pdf"' -F'"' 'RT{$0=RT; print $2}' file
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé-C1_L1.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2019_Henrot-Versillé_C1_L2.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Henrot-Versillé_C3.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_L1_Bayesian.pdf
https://ecole-euclid.cnrs.fr/wp-content/uploads/EDE2018_Martinelli_C2_TD_Bayesian.pdf