I'm using react-pdf which creates HTML based on the PDF document passed to it. There is a (for each PDF page) with 50+ (each line of text) nested into it which I need to parse the text of.
The goal is use this to "element.scrollIntoView()" when a specific string is found.
There seems to be 2 options for me to get the elements, an HTMLCollection and a NodeList. Currently I get an HTMLCollection by doing:
const spanElementCollection = page.getElementsByTagName("span");
I've found numerous resources recommending that I use Array.from(HTMLCollection) to convert to an array so I can parse it. After converting, the array is always empty.
const spanElements = Array.from(spanElementCollection);
Additionally, I've tried converting this to an array from a NodeList using const spanElementList = await page.querySelectorAll('div.textLayer > span'); which always returns an empty NodeList similar to the above screenshot.
Removing > span from querySelectorAll works to get the parent <div>, but I cannot get the ~50 NodeList <span> children into an array.
All of this is being called from within an inputRef(() => {}) as recommended by the developer of the package
I've already viewed the closes adjacent issue here



getElementsByTagName. Please make sure you only execute your code (to create an array, or to use another method) when the document is fully loaded with all relevent HTML elements.