1

I have some code to open and search through a folder full of pdfs. I'm using pdfminer to do the pdf conversion. But, some of my pdfs are not readable. I want my code to process those pdfs where the conversion works, and effectively skip over those pdfs where the conversion fails.

I'm trying to use the try/except feature, but it doesn't seem to be working. For the pdfs that fail, the exception works. But, for the pdfs where the conversion works, both the try and exception blocks are executed.

Here's my code:

fileNum = 0
d = shelve.open('PyDocSearch.db')
for file in fileList:
    fileNum += 1
    z = []
    try:
        doc = convert_pdf(filePath + '/' + file)
        print 'Success:',file
        docWords = doc.split()
        x = Counter(docWords)
        y = x.most_common()
        for i,j in enumerate(y):
            if j[0] not in commonWords:
                z.append(j)
        d[file] = z
    except:
        doc = 'fail'
        print 'Fail:',file
        d[file] = doc
d.close()

When the pdf conversion works, why are both blocks being executed? And, how can I prevent this from happening? Thanks!

8
  • 4
    For starters, don't use a bare except block, but specify what kinds of exceptions you want to catch. Then you'll narrow down the cases where the except is executed. Commented Oct 20, 2013 at 2:53
  • Your catch-all except will run for any exception. You really don't know if it's due to the conversion failure or something else from the try block. Commented Oct 20, 2013 at 2:55
  • Ok, that makes sense. How can I catch an exception for when the pdf conversion fails? Commented Oct 20, 2013 at 2:56
  • Also, only wrap as much code in the try as you are actually testing for exceptions. Move everything else below the except. Commented Oct 20, 2013 at 2:56
  • When a conversion fails, what exception does it raise? Commented Oct 20, 2013 at 3:48

1 Answer 1

1

One thing you can do is you can use the else clause of try ... except ... to execute code only if no exceptions were raised:

fileNum = 0
d = shelve.open('PyDocSearch.db')
for file in fileList:
    fileNum += 1
    z = []
    try:
        doc = convert_pdf(filePath + '/' + file)
    except:
        doc = 'fail'
        print 'Fail:',file
        d[file] = doc
    else:
        print 'Success:',file
        docWords = doc.split()
        x = Counter(docWords)
        y = x.most_common()
        for i,j in enumerate(y):
            if j[0] not in commonWords:
                z.append(j)
        d[file] = z
d.close()

The code in the else block is only executed if the code in the try block completes without raising an exception. If an exception gets raised in the else block, the except clause does not handle it.

As others have said, using except on its own is bad practice. The exceptions you're getting are quite probably trying to tell you something helpful about why your program is going wrong, but by using a 'bare' except you are sticking your fingers in your ears and saying 'la la la can't hear you' to Python's attempts to help you.

Generally it's good practice to only handle exceptions you are expecting. If you know that your PDF library raises SomePDFException if something goes wrong, it would be better to write

    except SomePDFException as e:

instead of

    except:

However, if you don't know the type, you can catch most1 exceptions with the following:

    except Exception as e:
        print "Got exception of type %s:" % type(e)
        print e

This then tells you the type of exception raised, and the message.

1There are a few exceptions that do not inherit from Exception, namely SystemExit, KeyboardInterrupt and GeneratorExit (documentation: Python 2, Python 3). I would be surprised if you are getting one of these, and I would hope the PDF library you are using follows the Python guidance by deriving its exceptions from Exception rather than BaseException.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.