Questions tagged [xpath]
The primary purpose of XPath is to address parts of an XML document. It also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax.
37 questions
1
vote
1
answer
119
views
Counting Ads in webpage using XPath and EasyList in Python
I have the following the function the retrieves a given webpage and returns the number of adverts that are on the page using a shortened version of EasyList (17,000 rules). Using multiprocessing, this ...
1
vote
1
answer
62
views
Extracting information from HTML with XSLT 3.0 when data is grouped visually as siblings in a td separated by blank lines
I have a work-in-progress where I'm using XSLT 3 to extract information from some preprocessed archaic HTML. I'd like to produce JSON showing the relationships between the various entities for further ...
2
votes
1
answer
551
views
Using XSLT 3.0 to extract information from real-world HTML and produce JSON
For work, I extract information from HTML and XML sources to save in databases. The objective is to generate JSON representing the source document's information and its relationships in order to 1) ...
3
votes
1
answer
255
views
Getting xpath indices to count forward to preserve table structure
Scraping tables from the web get complicated when there are 2 or more values in a cell. In order to preserve the table structure, I have devised a way to count the row-number index of its xpath, ...
-2
votes
1
answer
114
views
Java xPath: simple expression evaluation [closed]
This code is using XPath to expose a library function that can check whether an XML file contains a string or an expression:
...
2
votes
1
answer
109
views
Flattening XML and selecting nodes based on input to convert to CSV
I have some XML that I want to flatten based on its XPath. The task is to acquire certain nodes that are passed as input. That said, I have to get to parent nodes and look for nodes that might be part ...
2
votes
1
answer
1k
views
Beautifulsoup and lxml (xpath) too slow with respect to regex when parsing HTML
I agree that using regex to parse HTML is not a good way, in particular I am worried about their fragility with respect to change in the HTML.
The problem is that any alternatives are really too slow....
2
votes
1
answer
5k
views
Scrapy crawler to parse data recursively
I've written a script in python scrapy to parse "name" and "price" of different products from a website. Firstly, it scrapes the links of different categories from the upper sided bar located in the ...
5
votes
1
answer
3k
views
Extracting necessary records from LinkedIn
I wanted to create a scraper in python which can fetch required data from LinkedIn. I tried with python in many different ways but I could not make it until I used selenium in combination with. ...
3
votes
1
answer
6k
views
Scraping content from a javascript enabled website with load more button
The script I've written is able to scrape name, address, phone and web address from a webpage using python and selenium. The main barrier I had to face was to exhaust the load more button to get the ...
2
votes
2
answers
305
views
Web-crawler for yellowpage designed using python
I've written some code to scrape name, address, and phone number from yellowpage using python. This scraper has got input parameter. If the input is properly filled in and the filled in url exists in ...
2
votes
1
answer
612
views
Scraping data unveiling a button from craigslist
I've written some code to parse the names and phone numbers from craigslist. It starts from the link in "m_url" then goes one layer deep to parse the name and then again another layer deep to parse ...
2
votes
1
answer
9k
views
Scraping table contents from a webpage using vba with selenium
My script is able to harvest full contents of a table from a webpage with javascript encrypted using vba in combination with selenium. The table has got a drop-down option from where the full contents ...
3
votes
1
answer
1k
views
Web scraper for parsing names and email addresses from Yellowpage
I've written a script using python to parse the names and email addresses of different pizza shops in USA. I am very new in writing classes using python so I'm not very sure I didn't do anything wrong ...
13
votes
3
answers
15k
views
Evaluating an XPath with document.evaluate() to get an array of nodes
The Problem Statement:
Filter all nodes with an existing attribute that starts with a specific string (temp for example purposes). Print an array of node string ...