2

I am creating a 'generic' web scraper the would scrape any page having a list of entries. I would like to drive from the config the tags that it should extract.

Example with the following config:

{ 
    name : "price",
    valueJQueryExpression : ".mt9 > .mt7.b"
},

... I'm parsing the following way:

const $ = require('cheerio');
let jquery = getQuery("price");
let keys = $(jquery);

However, I have more tricky parsers to handle, eg. that one:

let location = $('.mt9 > .b', html).not('.mt5').not('.mt7').text().trim()

In such case I thought using an eval() and pass the full expression in the config. However this is not recommended due to safety issues.

Would you have any recommendation on handling this differently?

1
  • try using xpath instead of css selector, you won't have to chain jQuery functions. Commented Aug 30, 2019 at 6:25

2 Answers 2

3

You should be able to use the :not pseudo class here. Try the following:

$('.mt9 > .b:not(.mt5):not(.mt7)', html).text().trim()

It is similar to jQuery, where the selector specified inside :not() will be used to exclude elements from the matches.

You can see it in action below:

.mt9 > .b:not(.mt5):not(.mt7) {
  color: red;
}
<div class="mt9">
  <div class="b">This should be red</div>
  <div class="b mt7">This should not be red</div>
  <div class="b mt5">This should not be red</div>
</div>

Sign up to request clarification or add additional context in comments.

Comments

0

var command = 'console.log("Hello")';
var s = document.createElement("script");
s.textContent = command;
document.head.appendChild(s);

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.