2

How to download hundereds of .pdf files from http://www.ncbi.nlm.nih.gov/pmc/articles using a loop, for example for the following document ids:

PMC3386155
PMC3625956
PMC3477654
PMC3531051
PMC3114846
PMC3117879
PMC3130560
PMC3531173
PMC3546115
PMC3354575
PMC3771521
4
  • Are these open access or do you have to enter your institution's credentials every time? Commented Sep 21, 2013 at 18:17
  • 1
    related: unix.stackexchange.com/questions/83687/… Commented Sep 21, 2013 at 18:22
  • pubmed Q: unix.stackexchange.com/questions/91696/…. Why is this one different? Commented Sep 21, 2013 at 18:39
  • @sami have you checked that script ? is there any issue ? Commented Sep 22, 2013 at 8:55

1 Answer 1

4

Here is Working Tested Script

Using wget

#!/usr/bin/env bash

Link="http://www.ncbi.nlm.nih.gov/pmc/articles/"

ID=(    PMC3386155 PMC3625956 PMC3477654 PMC3531051
        PMC3114846 PMC3117879 PMC3130560 PMC3531173
        PMC3546115 PMC3354575 PMC3771521 )

for f in ${ID[@]};
do
   wget  --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" \
         -l1 --no-parent -A.pdf ${Link}${f}/pdf/ -O ${f}.pdf
done

Since Remote site does not allow user agent like wget and curl that's why we have to explicitly specify user agent in wget

Using Curl

ID=( PMC3386155 PMC3625956 PMC3477654 PMC3531051 PMC3114846 PMC3117879 PMC3130560 PMC3531173 PMC3546115 PMC3354575 PMC3771521 )

Link="http://www.ncbi.nlm.nih.gov/pmc/articles/"

Args='-O -J -L -A "Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"'

printf "%s\n" ${ID[@]}  | xargs -n1 -I{} echo curl $Args ${Link}'{}'/pdf/ | sh

Some explanation

  • -O Output File
  • -J Output File name from remote-header-name ( curl 7.21.2 or newer )
  • -L Remote site redirected to other download page to follow that use this
  • -A User agent
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.