4

Anyone know a library in python/ruby that analize images and extract text inside?

Or a book about image processing ect...

PS: The text is in varius fonts and formats but clear, Tl;Dr: No captcha or similar.

6
  • 1
    What does the last line you have written convey ? or is it written by mistake ? Commented Jul 15, 2012 at 7:16
  • possible duplicate of OCR for recognising handwriting in .NET Commented Jul 15, 2012 at 7:17
  • @Angelbit I pointed out one particular duplicate, but this question is really a duplicate of almost any OCR question on StackOverflow. Commented Jul 15, 2012 at 7:18
  • Sorry, my english is very poor, the text inside images is written in various sizes and formats (bold, italic ect.) Commented Jul 15, 2012 at 7:20
  • 1
    @AdamMihalcin Have edit, don't have find any question ruby/python specific. Commented Jul 15, 2012 at 7:26

1 Answer 1

15

You can use OpenCV, an opensource computer vision library and It has Python API. It is considered to be an industry-standard library nowadays.

OpenCV official site : http://opencv.org/

If you need some tutorials on OpenCV-Python, visit : opencvpython.blogspot.com

You can also check this SOF : Simple Digit Recognition OCR in OpenCV-Python

In addition to that, OpenCV samples has got some OCR implementations.

But I would recommend you to use Tesseract for OCR. It is the best Open source OCR engine, developed by HP, but now handled by Google.

Tesseract site : https://github.com/tesseract-ocr/tesseract

Python API of tesseract, Pytesser : https://github.com/RobinDavid/Pytesser

Also check this SOF : How do I choose between Tesseract and OpenCV?

So you can use OpenCV to preprocess the image and use Tesseract for OCR.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.