3

Using Linux Bash, how can one turn a text file with:

http://example.org/
https://en.wikipedia.org/wiki/Main_Page
https://www.youtube.com/watch?v=mGQFZxIuURE

into:

http://example.org/ Example Domain
https://en.wikipedia.org/wiki/Main_Page Wikipedia, the free encyclopedia
https://www.youtube.com/watch?v=mGQFZxIuURE Mike Perry - The Ocean (ft. Shy Martin) - YouTube

or into:

http://example.org/
Example Domain

https://en.wikipedia.org/wiki/Main_Page 
Wikipedia, the free encyclopedia

https://www.youtube.com/watch?v=mGQFZxIuURE
Mike Perry - The Ocean (ft. Shy Martin) - YouTube

?

How can one

  1. pull a URL from a list of URLs in a file,
  2. load the page,
  3. extract its page title,
  4. add that page title following that URL on the same line as the URL or on the line immediately following, then

perform those steps 1-4 for each subsequent URL in that list?

If not using Linux Bash, what other way is there?

6
  • 1
    bash isn't (much of) a text processing tool Commented Jun 12, 2019 at 17:14
  • @JeffSchaller How can it be done then? How can one turn an extermely long URL list (e.g. list of YouTube videos) into URL + title? Commented Jun 12, 2019 at 17:17
  • 2
    I'm sure you'll have several good answers momentarily; just because bash is your shell doesn't mean it has to do everything. If you can spell out exactly how you want the transformation to happen, that would help answerers. How did you get "Example Domain" out of "example.org", for example(!) ? Are you sending a request to that URL and extracting an HTML tag? Commented Jun 12, 2019 at 17:18
  • 1
    That should go ^^^ up in your Question as an edit, please & thank you! Commented Jun 12, 2019 at 17:22
  • 1
    I'd recommend using Perl rather than bash scripting. Text processing is Perl's speciality. Commented Jun 12, 2019 at 17:32

1 Answer 1

7

With curl and pup:

while IFS= read -r url
do
   printf "%s " "$url"
   curl -sL "$url" | # fetch the page
       pup 'head title:first-of-type text{}' # get the text of the first title tag in head
done < input
9
  • Works but seems to require a blank line at the end of the input file. Otherwise, the last URL in the file isn't processed. Commented Jun 12, 2019 at 17:57
  • 2
    Not a blank line, just a newline at the end. That's the definition of a line Commented Jun 12, 2019 at 17:59
  • You're right: a newline. Just pointed it out since the prob manifests if you paste a bunch of URLs in a text editor and neglect to hit Enter at the end of the last URL in the list. Commented Jun 12, 2019 at 18:03
  • 1
    @DavidYockey You don't need an additional Enter at the end if you use a proper text editor, e.g. vim, emacs. Commented Jun 13, 2019 at 7:22
  • @Sparhawk Wasn't aware of that diff between editors. Good to know. In using vim for serious stuff and a simple GUI editor for quickie cut & paste bits, guess I never noticed. Commented Jun 13, 2019 at 11:32

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.