How I built DocTextExtractor to power NotteChat's AI-powered document chat, and how you can integrate it into your own Flutter apps.
As a Flutter developer with a passion for simplifying complex problems, I created DocTextExtractor—a lightweight, open-source Dart package that extracts text from .doc
, .docx
, .pdf
, Google Docs URLs, and .md
files.
This tool was born from the challenges I faced while building NotteChat, an app that allows users to chat with document content using AI. In this article, I’ll share how I built DocTextExtractor, why it matters, and how you can integrate it into your own Flutter projects.
Why I Built DocTextExtractor
NotteChat empowers students, professionals, and educators to interact with documents conversationally. Users simply paste a URL or upload a file to a document and can then summarize, explore, or ask questions about the content using AI.
However, supporting multiple document formats (.doc, .docx, PDF, Google Docs, Markdown) posed a serious challenge. Most existing Flutter solutions only worked for specific formats, and there was no unified solution.
So I built DocTextExtractor, with a goal to:
- Support
.doc
,.docx
,.pdf
,.md
, and Google Docs URLs - Handle both local files and URLs
- Enable offline parsing for
.doc
and.md
- Power AI by providing clean, structured text
- Extract real filenames for better UX
Building DocTextExtractor
1. Identifying the Need
The core feature of NotteChat—chatting with document content—meant I needed a consistent way to extract text, regardless of format or source.
Key Requirements:
- Unified API for all formats
- Clean filename extraction
- Minimal dependencies
- Cross-platform support (iOS, Android, Web)
2. Choosing the Tech Stack
I relied on the trusted Flutter/Dart ecosystem with the following tools and packages:
-
http
: Fetch documents via URLs -
syncfusion_flutter_pdf
: Parse and extract PDFs -
archive
+xml
: Extract from.docx
and.doc
-
markdown
: Convert.md
to plain text - VS Code + GitHub for development and version control
GitHub repo: github.com/Destiny-Ed/doc_text_extractor
3. Designing the Core Logic
At the heart of the package is the TextExtractor
class with a single extractText()
method.
Key Features:
-
Unified Return Type: A
Record(text, filename)
for easy use - Smart Format Detection: Checks HTTP Content-Type or file extension
-
Offline Support: No internet required for
.doc
and.md
- Error Handling: Friendly exceptions (e.g., "Unsupported document type")
4. Format-Specific Logic
Each format was tackled with custom logic:
-
.doc
: No existing Dart parser, so I created one using raw XML parsing -
.docx
: Unzipped and parsedword/document.xml
-
.md
: Usedmarkdown
package for plain-text conversion -
PDF
: Parsed usingsyncfusion_flutter_pdf
-
Google Docs
: Converted/edit
URLs to/export?format=pdf
and parsed as PDF
5. Filename Extraction
To enhance UX, I added a _extractFilename()
method that pulls names from:
-
Content-Disposition
headers (e.g.,filename="report.docx"
) - URL segments (e.g.,
https://example.com/readme.md
) - Google Docs metadata (fallback if unavailable)
6. Testing & Refinement
I tested with:
-
.doc
: Legacy Word file -
.docx
: Modern reports -
.md
: GitHub README files -
PDF
: Academic papers -
Google Docs
: Shared documents
Edge cases included:
- Missing headers
- Large files (>10MB)
- Unsupported formats
7. Publishing as a Package
To make DocTextExtractor reusable, I decided to publish it on pub dev, and that is also one of the reasons I'm writing this article
- Published on Pub.dev
- MIT Licensed
- Included example app
- Wrote a detailed README with usage examples
Why It Matters to NotteChat
DocTextExtractor is the backbone of NotteChat's AI-powered chat with documents.
It enables:
- AI Chat: Clean text fed into AI (e.g., "Summarize this PDF")
- Offline Use: Great for areas with limited internet
- Smart UX: Real filenames and helpful error messages
- Versatile Support: For modern and legacy users
How to Use DocTextExtractor in Flutter
Step 1: Add the Dependency
Add doc_text_extractor to your app’s pubspec.yaml.
yaml
dependencies:
doc_text_extractor: ^1.0.0
Run:
flutter pub get
Step 2: Import and Initialize
Import the package and create a TextExtractor instance:
import 'package:doc_text_extractor/doc_text_extractor.dart';
final extractor = TextExtractor();
Step 3: Extract Text from a URL
Use extractText to process URLs for .doc, .docx, .md, PDF, or Google Docs:
Future<void> processDocumentUrl(String url) async {
try {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Pass text to AI service (e.g., for NotteChat’s AI chat)
} catch (e) {
print('Error: $e');
// Show user-friendly error (e.g., "Please convert .doc to .docx")
}
}
Example usage
processDocumentUrl('https://raw.githubusercontent.com/user/repo/main/README.md');
processDocumentUrl('https://docs.google.com/document/d/EXAMPLE_ID/edit');
Step 4: Extract Text from a Local File
For local files (e.g., user-uploaded .md or .doc), set isUrl: false:
import 'package:path_provider/path_provider.dart';
import 'dart:io';
Future<void> processLocalFile(String filePath) async {
try {
final result = await extractor.extractText(filePath, isUrl: false);
final text = result.text;
final filename = result.filename;
print('Filename: $filename');
print('Text: ${text.substring(0, 100)}...');
// Use text in app logic
} catch (e) {
print('Error: $e');
}
}
Example usage
final dir = await getTemporaryDirectory();
processLocalFile('${dir.path}/sample.md');
Step 5: Integrate with your preferred AI API
You can now use the extracted text in your app with AI tools like OpenAI, Gemini, or Sonar APIs.
class ChatScreen extends StatelessWidget {
Future<void> _handleDocument(String url) async {
final result = await extractor.extractText(url);
final text = result.text;
final filename = result.filename;
// Update session title
final sessionTitle = 'Session ${DateTime.now().toIso8601String().split('T')[0]} - $filename';
// Summarize with AI (e.g., Sonar API)
final sonarService = SonarService();
final summary = await sonarService.queryDocument(text, 'Summarize this document');
// Display in UI
print('Session: $sessionTitle');
print('Summary: $summary');
}
}
Step 6: Enhance UX with Error Handling
Add loading dialogs for large files and user-friendly errors:
if (e.toString().contains('Unsupported document type')) {
ScaffoldMessenger.of(context).showSnackBar(
SnackBar(content: Text('Unsupported format. Try converting to .docx or PDF.')),
);
}
Final Thoughts
DocTextExtractor started as a necessity for NotteChat but evolved into a powerful, standalone Flutter package. It’s now available for anyone building document-based apps, AI tools, or productivity platforms.
Try it out: https://pub.dev/packages/doc_text_extractor
View the source: GitHub Repo
If you found this helpful or end up using the package, feel free to drop a ⭐ on GitHub or share your feedback. I’d love to hear how you’re using it!
Happy coding!
Top comments (5)
You did very well on this Destiny. Kudos.
Thank you Idris
Nice nice, in the past, I've had to use different converters, one for PDF, one for DocX, one for TXT etc.
This is really a game changer, I'll definitely try it out.
You will definitely love it! Thanks for reading
Nice posting! Can we talk? I'm eager to talk to you