1

I want to get urls from a bing search. I get the html, and when I do this regex /<h2><a href="(.*?)"/g it gives me :

["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]

For js code, I used match

html.match(/<h2><a href="(.*?)"/g);

I only want the urls. The html is here: http://www.bing.com/search?q=test. I've already searched the whole day, and I think maybe I have to use group?

3
  • 1
    /<h2><a href="([^"]+)"/g should do it Commented Dec 20, 2014 at 15:23
  • thanks for your reply Ismael. but it's the same thing . Commented Dec 20, 2014 at 15:26
  • This might helps you : stackoverflow.com/questions/3809401/… Commented Dec 20, 2014 at 15:37

3 Answers 3

1

Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.

"use strict";

var links = ['<h2><a href="https://www.test.com/"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"', 
 '<h2><a href="http://www.speedtest.net/"', 
 '<h2><a href="http://test.psychologies.com/"',
 '<h2><a href="http://www.thefreedictionary.com/test"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test"',
 '<h2><a href="http://www.wordreference.com/enfr/test"',
 '<h2><a href="http://www.sedecouvrir.fr/"',
 '<h2><a href="http://www.jeuxvideo.com/tests.htm"',
 '<h2><a href="http://en.wikipedia.org/wiki/Test"'];

var result = links.map(function (link) {
  return /<h2><a href="(.*?)"/.exec(link)[1];
});

console.log(result);
Sign up to request clarification or add additional context in comments.

1 Comment

The g flag in /g is not needed there. /g is for multiple matches. You're iterating over an array list of items guaranteed to provide only a single match.
0

That is an array. You need something like this. Also you need groups.

var urls = html.map(function(str){
   return str.replace(/.*href="([^"]+).*/, "$1");
});

Comments

0

If this is being done within a browser, there's really no need to try to use a regex.

var myNodeList= document.getElementsByTagName('a'); 
var i;
for (var i = 0; i < myNodeList.length; ++i) {
    var anchor = myNodeList[i];  
    console.debug(anchor.href);
}

But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:

while (match = re.exec(url)) {
     params[decode(match[1])] = decode(match[2]);
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.