When working with web content, you often need to convert HTML to Markdown for documentation, content analysis, or processing in LLM workflows. Traditional tools require multiple steps and complex pipelines, but with mq, you can convert HTML to Markdown and process it in a single command.
The Problem: Complex HTML Processing Workflows
Imagine you need to:
- Extract specific content from HTML pages
- Convert HTML documentation to Markdown
- Process web-scraped content for analysis
- Prepare HTML content for LLM inputs
Traditional workflows often involve multiple tools and complex scripts. With mq, you can handle all of this in one streamlined process.
Basic HTML to Markdown Conversion
mq supports HTML input natively. Here's how to convert HTML to Markdown:
# Convert HTML file to Markdown
$ mq -I html 'identity()' example.html
# Extract only headers from HTML
$ mq -I html 'select(or(.h1, .h2, .h3))' example.html
# Extract all code blocks from HTML
$ mq -I html '.code' example.html
Advanced Processing with mq-crawler
For batch processing of HTML files, mq includes mq-crawler
- a powerful tool for directory traversal and batch conversion:
# Convert all HTML to Markdown using mq-crawler
$ mqcr https://mqlang.org
# Extract specific elements from multiple HTML files
$ mqcr -o docs https://mqlang.org
Integration with Web Scraping Tools
mq works seamlessly with popular web scraping and conversion tools:
With curl and HTML processing
# Download and process HTML content
$ curl -s https://mqlang.org/book/start/example | mq -I html '.code | select(contains("curl"))'
Getting Started
Install mq and start processing HTML content immediately:
# Install mq via Homebrew
$ brew install harehare/tap/mq
# Install crawler via Homebrew
$ brew install harehare/tap/mqcr
Conclusion
mq transforms HTML to Markdown conversion from a multi-step process into a single, powerful command. Whether you're processing web documentation, analyzing scraped content, or preparing data for LLM workflows, mq provides the efficiency and flexibility you need.
Top comments (0)