How to Extract Text from Images with Python?
Last Updated :
04 Oct, 2025
OCR (Optical Character Recognition) is a technique used to convert text from images into editable and searchable digital text. For example, you can scan a printed page and turn it into editable text on your computer. In this article, we’ll use Python and the pytesseract library to extract text from images.
Installation
To enable OCR in Python, we use the pytesseract library:
pip install pytesseract
Note: On Windows, you also need to install the tesseract.exe binary. During installation, you’ll choose (or be given) an install path. Commonly it’s:
C:\Program Files\Tesseract-OCR\tesseract.exe
or
C:\Users\<username>\AppData\Local\Programs\Tesseract-OCR\tesseract.exe
Make sure to update your code with the correct path based on your system.
1. Import required libraries
from PIL import Image
import pytesseract
2. Set the path to the Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\<username>\AppData\Local\Programs\Tesseract-OCR\tesseract.exe"
3. Open the image using PIL:
image = Image.open("example_image.png")
4. Convert the image to grayscale to improve OCR accuracy:
gray_image = image.convert("L")
5. Extract text using pytesseract:
text = pytesseract.image_to_string(gray_image)
6. Clean the extracted text by removing unwanted characters (like page-break symbols):
clean_text = text.replace("\x0c", "").strip()
print(clean_text)
Examples
Example 1:
Image for demonstration:
An image of white text with black backgroundCode:
Python
from PIL import Image
import pytesseract
# Path to tesseract.exe (update if different on your computer)
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\gfg0753\AppData\Local\Programs\Tesseract-OCR\tesseract.exe"
# Open the image
img = Image.open("sample_text.png")
# Convert to grayscale (makes it easier for OCR)
img = img.convert("L")
# Extract text from the image
text = pytesseract.image_to_string(img)
# Remove extra characters and print the text
print(text.replace("\x0c", "").strip())
Output
now children state should after above same long made such
point run take call together few being would walk give
Example 2:
Image for demonstration:

Code:
Python
from PIL import Image
import pytesseract
# Correct path to tesseract.exe on your computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\gfg0753\AppData\Local\Programs\Tesseract-OCR\tesseract.exe"
# Path to the image
image_path = r"d.jpg"
# Open the image and convert it to grayscale
img = Image.open(image_path).convert("L")
# Extract text from the image
text = pytesseract.image_to_string(img)
# Clean up unwanted characters and print result
print(text.replace("\x0c", "").strip())
Output
Geeksforgeeks
Explore
Python Fundamentals
Python Data Structures
Advanced Python
Data Science with Python
Web Development with Python
Python Practice
My Profile