Skip to main content

Questions tagged [xpath]

The primary purpose of XPath is to address parts of an XML document. It also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax.

1 vote
1 answer
119 views

Counting Ads in webpage using XPath and EasyList in Python

I have the following the function the retrieves a given webpage and returns the number of adverts that are on the page using a shortened version of EasyList (17,000 rules). Using multiprocessing, this ...
NGH's user avatar
  • 167
1 vote
1 answer
62 views

Extracting information from HTML with XSLT 3.0 when data is grouped visually as siblings in a td separated by blank lines

I have a work-in-progress where I'm using XSLT 3 to extract information from some preprocessed archaic HTML. I'd like to produce JSON showing the relationships between the various entities for further ...
Forensic_07's user avatar
2 votes
1 answer
551 views

Using XSLT 3.0 to extract information from real-world HTML and produce JSON

For work, I extract information from HTML and XML sources to save in databases. The objective is to generate JSON representing the source document's information and its relationships in order to 1) ...
Forensic_07's user avatar
3 votes
1 answer
255 views

Getting xpath indices to count forward to preserve table structure

Scraping tables from the web get complicated when there are 2 or more values in a cell. In order to preserve the table structure, I have devised a way to count the row-number index of its xpath, ...
Sati's user avatar
  • 427
-2 votes
1 answer
114 views

Java xPath: simple expression evaluation [closed]

This code is using XPath to expose a library function that can check whether an XML file contains a string or an expression: ...
mattobob's user avatar
  • 107
2 votes
1 answer
109 views

Flattening XML and selecting nodes based on input to convert to CSV

I have some XML that I want to flatten based on its XPath. The task is to acquire certain nodes that are passed as input. That said, I have to get to parent nodes and look for nodes that might be part ...
noobie-php's user avatar
2 votes
1 answer
1k views

Beautifulsoup and lxml (xpath) too slow with respect to regex when parsing HTML

I agree that using regex to parse HTML is not a good way, in particular I am worried about their fragility with respect to change in the HTML. The problem is that any alternatives are really too slow....
Ruggero Turra's user avatar
2 votes
1 answer
5k views

Scrapy crawler to parse data recursively

I've written a script in python scrapy to parse "name" and "price" of different products from a website. Firstly, it scrapes the links of different categories from the upper sided bar located in the ...
SIM's user avatar
  • 2,501
5 votes
1 answer
3k views

Extracting necessary records from LinkedIn

I wanted to create a scraper in python which can fetch required data from LinkedIn. I tried with python in many different ways but I could not make it until I used selenium in combination with. ...
MITHU's user avatar
  • 545
3 votes
1 answer
6k views

Scraping content from a javascript enabled website with load more button

The script I've written is able to scrape name, address, phone and web address from a webpage using python and selenium. The main barrier I had to face was to exhaust the load more button to get the ...
SIM's user avatar
  • 2,501
2 votes
2 answers
305 views

Web-crawler for yellowpage designed using python

I've written some code to scrape name, address, and phone number from yellowpage using python. This scraper has got input parameter. If the input is properly filled in and the filled in url exists in ...
SIM's user avatar
  • 2,501
2 votes
1 answer
612 views

Scraping data unveiling a button from craigslist

I've written some code to parse the names and phone numbers from craigslist. It starts from the link in "m_url" then goes one layer deep to parse the name and then again another layer deep to parse ...
SIM's user avatar
  • 2,501
2 votes
1 answer
9k views

Scraping table contents from a webpage using vba with selenium

My script is able to harvest full contents of a table from a webpage with javascript encrypted using vba in combination with selenium. The table has got a drop-down option from where the full contents ...
SIM's user avatar
  • 2,501
3 votes
1 answer
1k views

Web scraper for parsing names and email addresses from Yellowpage

I've written a script using python to parse the names and email addresses of different pizza shops in USA. I am very new in writing classes using python so I'm not very sure I didn't do anything wrong ...
SIM's user avatar
  • 2,501
13 votes
3 answers
15k views

Evaluating an XPath with document.evaluate() to get an array of nodes

The Problem Statement: Filter all nodes with an existing attribute that starts with a specific string (temp for example purposes). Print an array of node string ...
alecxe's user avatar
  • 17.5k

15 30 50 per page