41

I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it .

What Java library should I use?

1

4 Answers 4

21

Don't know how good it is (it definitely needs to be trained first), but there is Ron Cemer's Java OCR library.

Sign up to request clarification or add additional context in comments.

Comments

7

If you are looking for a very extensible option or have a specific problem domain you could consider rolling your own using the Java Object Oriented Neural Engine.

I used it successfully in a personal project to identify the letter from an image such as this, you can find all the source for the OCR component of my application on github, here.

Comments

6

try tesseract, checkout this article http://www.itwizard.ro/interfacing-cc-libraries-via-jni-example-tesseract-163.html and this example http://code.google.com/p/mezzofanti/

Edit: some more facts - tesseract is one of the best open source OCR used by google - there is training data available for many languages - mezzofanti is an android app that uses tesseract - beware: OCR does use a lot of CPU power. trying to OCR a A4 page with your T-Mob G1 will take a lot of time and the result may not impress you ;-)

2 Comments

tesseract does work but its reading ability is quite poor for even the simplest text.
thats why you have to train it @mP. - I was able to get good results with the default training while implementing ISBN reader. Try this link, I didn't use their experiences yet but I have it in my bookmarks for a long time And I think it is good source od info vbridge.co.uk/2012/11/05/…
0

You can use the OCR feature from Google Docs. Check the Documents List Data API http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#OCR

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.