Introduction
In this tutorial, we will create a tool that converts HTML content into a Word document (DOCX format). This is particularly useful for developers working with LLM-based agents (such as ChatGPT, GPT-4, or other AI models) that generate dynamic HTML content. By integrating this tool, you can transform the AI's responses into polished, shareable documents in DOCX format.
By the end of this guide, you'll have a working tool that takes an HTML string—generated by an LLM or any external source—and outputs a beautifully formatted Word document.
Prerequisites
Before we get started, make sure you have the following:
Install-Package DocumentFormat.OpenXml
Install-Package HtmlToOpenXml
Step 1: Use Cases for LLM Agents
This tool can be integrated into workflows involving large language models (LLMs) in the following ways:
Document Generation:
Use an LLM to generate HTML-formatted reports, such as meeting minutes, invoices, or summaries.
Prompt:
Generate a professional HTML notice announcing a scheduled maintenance for a company's website using only HTML tags and attributes supported by the DocumentFormat.OpenXml library. Do not include unsupported elements like <style>, <script>, or complex CSS. Instead, use basic inline styling (e.g., style="font-family:Arial;").
The notice should include:
A header with the title "Scheduled Maintenance Notice."
A brief message explaining the maintenance date, time, and expected downtime.
A bulleted list of affected services.
A contact email for support.
A footer with the company's copyright information.
Example date: June 30th, 2025, from 1:00 AM to 5:00 AM.
Here’s an example response the LLM might generate:
<!DOCTYPE html>
<html>
<head>
<title>Scheduled Maintenance Notice</title>
</head>
<body style="font-family:Arial; margin:20px;">
<h1 style="color:#007acc; text-align:center;">Scheduled Maintenance Notice</h1>
<p>Dear Users,</p>
<p>We would like to inform you about our upcoming scheduled maintenance:</p>
<p><strong>Date:</strong> June 30th, 2025</p>
<p><strong>Time:</strong> 1:00 AM to 5:00 AM (UTC)</p>
<p>The following services will be affected:</p>
<ul>
<li>Website browsing</li>
<li>User account access</li>
<li>Online payments</li>
</ul>
<p>We apologize for any inconvenience this may cause and appreciate your understanding as we work to improve our services.</p>
<p>If you have any questions, please contact us at <a href="mailto:[email protected]">[email protected]</a>.</p>
<p>Thank you for your patience and support.</p>
<p>Sincerely,</p>
<p><strong>Your Company Name</strong></p>
<hr style="border:0; border-top:1px solid #ccc;">
<p style="text-align:center; font-size:0.9em; color:#666;">© 2025 Your Company Name. All rights reserved.</p>
</body>
</html>
Step 2: Setting Up the Project
Create a New Console App
Open your IDE and create a new Console App project.
Add the necessary NuGet packages as listed in the prerequisites.
Create the Program.cs file and copy the following code:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using HtmlToOpenXml;
class Program
{
static void Main(string[] args)
{
const string filename = "C:\\Users\\User\\Downloads\\demo.docx";
Console.WriteLine("Starting the HTML to DOCX conversion process...");
string html = @"
<!DOCTYPE html>
<html>
<head>
<title>Scheduled Maintenance Notice</title>
</head>
<body style=""font-family:Arial; margin:20px;"">
<h1 style=""color:#007acc; text-align:center;"">Scheduled Maintenance Notice</h1>
<p>Dear Users,</p>
<p>We would like to inform you about our upcoming scheduled maintenance:</p>
<p><strong>Date:</strong> June 30th, 2025</p>
<p><strong>Time:</strong> 1:00 AM to 5:00 AM (UTC)</p>
<p>The following services will be affected:</p>
<ul>
<li>Website browsing</li>
<li>User account access</li>
<li>Online payments</li>
</ul>
<p>We apologize for any inconvenience this may cause and appreciate your understanding as we work to improve our services.</p>
<p>If you have any questions, please contact us at <a href=""mailto:[email protected]"">[email protected]</a>.</p>
<p>Thank you for your patience and support.</p>
<p>Sincerely,</p>
<p><strong>Your Company Name</strong></p>
<hr style=""border:0; border-top:1px solid #ccc;"">
<p style=""text-align:center; font-size:0.9em; color:#666;"">© 2025 Your Company Name. All rights reserved.</p>
</body>
</html>
";
if (File.Exists(filename))
{
Console.WriteLine("File already exists. Deleting the old file...");
File.Delete(filename);
}
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
Console.WriteLine("Creating WordprocessingDocument...");
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
Console.WriteLine("MainDocumentPart not found. Creating new one...");
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
Console.WriteLine("Parsing HTML content and injecting into document...");
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ParseBody(html);
mainPart.Document.Save();
Console.WriteLine("Document saved in memory stream.");
}
Console.WriteLine("Writing document to disk...");
File.WriteAllBytes(filename, generatedDocument.ToArray());
}
Console.WriteLine("Process completed successfully!");
}
}
Step 3: Running the Tool
Save your changes and build the project.
Run the program. If successful, you'll see the following console output:
Starting the HTML to DOCX conversion process...
File already exists. Deleting the old file...
Creating WordprocessingDocument...
MainDocumentPart not found. Creating new one...
Parsing HTML content and injecting into document...
Document saved in memory stream.
Writing document to disk...
Process completed successfully!
Conclusion
Congratulations! You’ve successfully built a tool that converts HTML content—potentially generated by LLM-based agents—into a DOCX file using OpenXML and HtmlToOpenXml. This tool can serve as a powerful addition to workflows involving AI-generated content.
Key Benefits:
- Potential integration with LLM-generated HTML.
- Customizable and scalable for various use cases.
- Fast and efficient document generation.
Challenge for You! (Updated on 19/06/2025)
Combine this tool with my new article on audio-to-text solutions to create an application that:
Converts audio to text using Azure AI Speech SDK.
Generates a polished DOCX document from the recognized text using the OpenXML SDK.
This integrated solution could be used to transcribe meetings, presentations, or interviews into shareable, professional documents. Share your implementation and insights in the comments!
Reference:
https://github.com/onizet/html2openxml/wiki
Love C#
Top comments (1)
I hope this reading inspires and sparks more brainstorming ideas for you. Whether you're integrating AI agents for document generation or exploring new possibilities with OpenXML, the potential for creating innovative solutions is limitless. Let your creativity flow, and happy coding!