grep command in curl

Question

I am trying to extract some URLs from a web page using cURL command. Initially, I use the cURL command as below.

curl www.website.com/

Now, the website contains links to some other websites which am interested in extracting. So, I do a grep on the cURL command as below.

curl www.website.com/ | grep "<a href=" > new1.txt

It is extracting all lines which have <a href= in them. But am particularly interested only in lines which start with <a href= and end with title=

How can I modify the grep command?

If that's all you need, you can do grep "<a href=.*title=" but this can get complicated when parsing HTML. — terdon
– terdon ♦, Commented Feb 10, 2014 at 19:31

Ketan · Accepted Answer · 2014-02-10 19:39:31Z

2

This should work:

curl www.website.com/ | grep '^<a href=.*title=$' > new1.txt

This will select all lines that begin with <a href= and end in title=

Just saw Terdon's comment. You can use -P option with grep and make a non-greedy aka lazy alternative as follows:

curl www.website.com/ | grep -P '^<a href=.*?title=$' > new1.txt

edited Feb 10, 2014 at 19:39

answered Feb 10, 2014 at 19:34

Ketan

9,4267 gold badges44 silver badges57 bronze badges

Add a comment |

Community · Accepted Answer · 2017-05-23 12:40:03Z

2

Keeping in mind that HTML is not a regular language, and parsing it with regular expressions is nigh-impossible, you could try:

... | grep '^<a href=.*title=.*' > ...

Edit: Saw that you specified lines that start with <a href; the caret takes care of that condition.

edited May 23, 2017 at 12:40

CommunityBot

1

answered Feb 10, 2014 at 19:31

DopeGhoti

79.2k10 gold badges107 silver badges141 bronze badges

Add a comment |

Stack Exchange Network

grep command in curl

2 Answers 2

You must log in to answer this question.

Hot Network Questions

grep command in curl

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions