DEV Community

Cover image for Streamline HTML to Markdown Conversion with mq: From Web Scraping to Document Processing
Takahiro Sato
Takahiro Sato

Posted on

Streamline HTML to Markdown Conversion with mq: From Web Scraping to Document Processing

When working with web content, you often need to convert HTML to Markdown for documentation, content analysis, or processing in LLM workflows. Traditional tools require multiple steps and complex pipelines, but with mq, you can convert HTML to Markdown and process it in a single command.

demo

The Problem: Complex HTML Processing Workflows

Imagine you need to:

  • Extract specific content from HTML pages
  • Convert HTML documentation to Markdown
  • Process web-scraped content for analysis
  • Prepare HTML content for LLM inputs

Traditional workflows often involve multiple tools and complex scripts. With mq, you can handle all of this in one streamlined process.

Basic HTML to Markdown Conversion

mq supports HTML input natively. Here's how to convert HTML to Markdown:

# Convert HTML file to Markdown
$ mq -I html 'identity()' example.html

# Extract only headers from HTML
$ mq -I html 'select(or(.h1, .h2, .h3))' example.html

# Extract all code blocks from HTML
$ mq -I html '.code' example.html
Enter fullscreen mode Exit fullscreen mode

Advanced Processing with mq-crawler

For batch processing of HTML files, mq includes mq-crawler - a powerful tool for directory traversal and batch conversion:

# Convert all HTML to Markdown using mq-crawler 
$ mqcr https://mqlang.org

# Extract specific elements from multiple HTML files
$ mqcr -o docs https://mqlang.org
Enter fullscreen mode Exit fullscreen mode

Integration with Web Scraping Tools

mq works seamlessly with popular web scraping and conversion tools:

With curl and HTML processing

# Download and process HTML content
$ curl -s https://mqlang.org/book/start/example | mq -I html '.code | select(contains("curl"))'
Enter fullscreen mode Exit fullscreen mode

Getting Started

Install mq and start processing HTML content immediately:

# Install mq via Homebrew
$ brew install harehare/tap/mq
# Install crawler via Homebrew
$ brew install harehare/tap/mqcr
Enter fullscreen mode Exit fullscreen mode

Conclusion

mq transforms HTML to Markdown conversion from a multi-step process into a single, powerful command. Whether you're processing web documentation, analyzing scraped content, or preparing data for LLM workflows, mq provides the efficiency and flexibility you need.


Top comments (0)