0

I'm trying to get the text from a pdf document using pdf.js in JS. However, pdf.js has no decent documentation, i've looked at the available examples, and I came up to this:

var pdfUrl = "http://localhost/test.pdf"
var pdf = PDFJS.getDocument(pdfUrl);
pdf.then(function(pdf) {
    var maxPages = pdf.pdfInfo.numPages;
    for (var j = 1; j < maxPages; j++) {
        var page = pdf.getPage(j);

        page.then(function() {
            var textContent = page.getTextContent();

        })
    }
});

The page bit is working, because I can see it is a promiss. However, running this bit gives:

Warning: Unhandled rejection: TypeError: Object #<Object> has no method 'getTextContent'
TypeError: Object #<Object> has no method 'getTextContent'

It is working this way in examples i've seen. It is getting the page, and I can print out number of pages.

Anyone with experience who can shed a light?

*Bonus question: I'm only interested in parsing pdf, not in rendering it in browser. However it has to be done clientside. Is pdf.js the right hammer for the job?

2
  • 1
    May not be the problem but page.then(function() { should be page.then(function(page) { Commented Dec 15, 2013 at 18:58
  • It actually was the problem! Make it an answer and you're done. Commented Dec 15, 2013 at 19:04

3 Answers 3

3

page.then(function() { should be page.then(function(page) {

Sign up to request clarification or add additional context in comments.

Comments

1

PDF.js renders your pdf file and generates words then outputs them as html elements . Each element is then placed above your pdf with css property {position:absolute;left:X,top:Y} and masked over your pdf.

These divs are given css property {color:transparent}. this does the trick of selection highlighting, it appears that you are directly selecting from the pdf file but actually you are selecting the created html elements.

this is exactly how it works, if you want to render the pdf file it is okay but keep it in your mind that if you wanted to change the output technique (html transparent divs) you have to bring your own replacement...

Comments

0

You also need to change it to

for (var j = 1; j <= maxPages; j++) {

otherwise you'll never get the first page.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.