Skip to content

SandeepSangu/pdf-to-text-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Real-World PDF Parsing with AWS

This repository supports the Dev.to blog series focused on building a document parsing pipeline using AWS Textract — starting from local testing and scaling to serverless automation.

📚 Contents

📍 Part 1 – Local Testing using Python
Simple script-based extraction and text validation using AWS Textract locally.
🔗 Explore → /local/README.md

📍 Part 2 – Serverless Automation with Lambda, S3 & Textract
Event-driven pipeline triggered by PDF uploads into S3 and stored in DynamoDB.
🔗 Explore → /automation/README-automation.md

🛠️ Technologies Used:

AWS ·AWS Textract · Lambda · S3 · DynamoDB · Python · Boto3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages