Structured text tools

The following is a list of text-based file formats and command line tools for manipulating each.

DSV

Delimiter-separated values, including CSV, TSV, etc.

Awk

Awk is a POSIX-standard command line tool and programming language for processing DSV data. If you use Linux, macOS or a BSD, you almost certainly have it installed. See below for Windows.

If you already know how to program, the nawk man page is a great way to learn Awk quickly. What you learn from it will apply to other implementations on different platforms. Read it first if you feel overwhelmed by the sheer size of the GNU Awk manual.
Awk.info archive — an extensive resource on Awk.
AWK Vs NAWK Vs GAWK — a comparison of implementations' features.
busybox-w32 includes a full implementation of POSIX Awk and other tools like sed in a single Windows executable.
GNU Awk 5 binaries for Windows by EZWinPorts.

POSIX commands

Name	Description
`comm`	Select lines common to two sorted files or those contained in only one of them. (Manual: `man 1 comm` on your system, GNU, FreeBSD.)
`cut`	Select portions of each line in one or several files. Can work with delimiter-separated fields. (Manual: `man 1 cut`, GNU, FreeBSD.)
`grep`	Select lines from one or several files. (Manual: `man 1 grep`, GNU, FreeBSD.)
`join`	Join the lines from two files on a common field. (Manual: `man 1 join`, GNU, FreeBSD.)
`paste`	Combine several consecutive lines in a text file into one. (Manual: `man 1 paste`, GNU, FreeBSD.)
`sort`	Sort lines by key fields. (Manual: `man 1 sort`, GNU, FreeBSD.)
`uniq`	Find or remove repeated lines. (Manual: `man 1 uniq`, GNU, FreeBSD.)

Other tools

Name and link	Description
csv-nix-tools	List *nix system information such as environment variables, files, processes, network connections, users as CSV. Manipulate and pretty-print CSV. Execute CSV rows as commands.
csv2md	Convert CSV to Markdown tables.
csv2html	Convert CSV to HTML tables.
csvfaker	Generate CSV files with fake data. Supports different types of fake data in different locales: names, cities, jobs, email addresses, and others.
csvfix	A multitool. Compare, filter, normalize, split, and validate CSV files. Reorder, remove, split, and merge fields. Convert data between fixed-width, multi-line, XML, and DSV format. Generate SQL statements. Documentation.
csvkit	csvkit is a suite of command-line tools for converting to and working with CSV: convert, clean, cut, grep, join, sort, stack, format, render, query, analyze, etc.
csvquote	Transform CSV to and from a format processable with regular POSIX tools.
csvtk	Search, sample, cut, join, transpose, and sort CSV/TSV files. Rename columns. Replace fields and generate new fiends from existing fields. Plot data as vector or raster histograms and box, line, and scatter plots. Convert CSV to Markdown. Convert XLSX to CSV. Split XLSX sheets.
GNU datamash	Perform statistical operations on text input.
jp (sgreben)	Plot data. See the JSON section.
Mario	See the JSON section.
MCMD (M-Command)	Select, sample, cut, join, sort, reformat, and generate CSV files. Contains a large set of commands.
Miller	`sed`, `awk`, `cut`, `join` and `sort` for name-indexed data such as CSV and tabular JSON.
pawk	Process text with Awk-like patterns, but Python code.
rows	A Python library with a CLI. Convert between a number of file formats for tabular data: CSV, XLS, XLSX, ODS, and others. Query the data (via SQLite). Combine tables. Generate schemas.
tab	A non-Turing-complete statically typed programming language for data processing. An alternative to Awk.
eBay's TSV utilities	Filtering, statistics, sampling, joins and other operations on TSV files. High performance, especially good for large datasets. Written in D.
tv	View delimited files in the terminal.
VisiData	Explore interactively data in TSV, CSV, XLS, XLSX, HDF5, JSON, and other formats. Introduction.
xsv	Index, slice, analyze, split, and join CSV files.

SQL-based tools

See the Grand Comparison Table of SQL-based Tools. It covers

AlaSQL CLI
csvq
csvsql
fsql
q
RBQL
rows
Sqawk (dbohdan)
sqawk (tjunier)
Squawk
termsql
trdsql
textql

XML, HTML

Name and link	Description
html-xml-utils	A number of simple utilities (like `hxcopy`, `hxpipe`, `hxunent`, `hxselect`) for manipulating HTML and XML files from W3C. Written in C, quite old-fashioned, but still relevant and maintained.
Mario	See the JSON section.
pup	Query HTML pages with CSS selectors. Static binaries available for releases. Inspired by jq.
Saxon	Query XML and HTML data with XPath. Documentation.
sml2	Convert between XML and SML, a simplified XML representation.
Temme	Query HTML with CSS-like selectors to extract JSON. Temme extends CSS selectors with value capture patterns.
tidy-html5	Validate, fix, and reformat HTML(5), XHTML, and XML documents. Convert HTML to XHTML.
tq	Query HTML with CSS selectors.
Xidel	Query or modify XML and HTML pages with XPath, XQuery 3, and CSS selectors.
xml-to-json-fast	Convert XML to JSON. Can handle very large XML files.
xml2	Convert XML and HTML to and from flat, greppable lists of "path=value" statements. Source code mirror.
xmljson	Convert multiple and large XML files to JSON. Written in Swift.
XMLLint	Query (including XSLT), validate and reformat XML documents.
XMLStarlet	Query, modify, and validate XML documents.
xq	jq wrapper for XML documents.
xsltproc	Transform XML documents using XSLT and EXSLT.

See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.

JSON

Name and link	Description
fx	Run arbitrary JavaScript on JSON input. Standalone binaries available.
gojq	A pure Go implementation of jq (see below). Supports YAML input and output.
gron	Convert JSON to and from flat, greppable lists of "path=value" statements.
JC	Convert the output of standard command line tools to JSON.
jello	Query JSON and JSON Lines with Python code. Output the result in a line-based format suitable for creating Bash arrays. Generate a grep-able schema.
jet	Convert between and query JSON, Clojure's edn, and Transit.
jfq	Query and transform JSON with the JSONata language.
jid	Explore JSON interactively with filtering queries like jq.
jj	Query and modify values in JSON or JSON Lines with a key path.
jl	Query and manipulate JSON using a tiny functional language.
jo	Create JSON objects from the shell.
jp (jmespath)	Query JSON with JMESPath.
jp (sgreben)	Plot JSON and CSV data in the terminal. Supports different kinds of plots: bar charts, line charts, scatter plots, histograms, and heatmaps.
jplot	Plot real-time JSON data in the terminal (works with terminals supporting graphic rendering).
jq	Create and manipulate JSON with a functional (as in "functional programming") DSL. Can convert JSON to other formats.
jtbl	Format JSON or JSON Lines as a plain-text table.
jtc	Create, manipulate, search, validate JSON with path expressions. Can be used as a C++14 library.
emuto	CLI tool similar to jq. Create and manipulate JSON and other files. Can be compiled to JavaScript.
jshon	Create and manipulate JSON using getopt-style command-line options.
json2	Convert JSON to and from flat, greppable lists of "path=value" statements. Modeled after xml2.
jsonaxe	Create and manipulate JSON with a Python-based DSL. Inspired by jq.
json	Run arbitrary JavaScript on JSON input.
json-table	Convert nested JSON into CSV or TSV for processing in the shell.
json.tool (Python 3 docs)	Validate and pretty-print JSON. This module is part of the standard library of Python 2/3 and is likely to be available wherever Python is installed.
jsonwatch	Track changes in JSON data from the command line. Works like `watch -d`.
lobar	Explore JSON interactively or process it in batch with a wrapper for `lodash.chain()`. An alternative to jq with a JavaScript syntax.
Mario	Manipulate and convert between CSV, JSON, YAML, TOML, and XML with Python code.
quicktype	Infer the underlying model of the JSON and output as types for various programming languages or JSON Schema. CLI and Web UI.
ramda-cli	Manipulate JSON with the Ramda functional library, and either LiveScript or JavaScript syntax.
RecordStream	Create, manipulate, and output a stream of records, or JSON objects. Can retrieve records from an SQL database, MongoDB, Atom feeds, XML, and other sources.
rq	Create and manipulate JSON with a DSL inspired by Rust, C and JavaScript. Similar to jq. Supports JSON, YAML and TOML as well as binary formats like Apache Avro and MessagePack.
validjson	Validate or pretty-print JSON.
VisiData	Explore data interactively data. See the DSV/Other tools section.

YAML, TOML

With a format converter like Remarshal (below) you can use JSON tools to process YAML and TOML, but make sure you do not lose data in the conversion.

Name and link	Description
gojq	See the JSON section.
Mario	See the JSON section.
Remarshal	Convert between JSON, MessagePack, TOML, and YAML. Validate each of the formats. Pretty-print JSON, TOML, and YAML.
rq	See the JSON section.
shyaml	Query YAML. Can output null-terminated strings for use in shell scripts.
validtoml	Validate TOML.
validyaml	Validate or pretty-print YAML.
yaml-tools	A set of CLI tools to manipulate YAML files (merge, delete, etc...) with comment preservation, based on ruamel.yaml.
yq (kislyuk)	jq wrapper for YAML.
yq (mikefarah)	Query, modify, and merge YAML. Convert to and from JSON.

Log files

Name and link	Description
Squawk	Query Apache and Nginx log files. See the SQL-based tool comparison.
lnav	Query and watch log files. Has batch and interactive mode. Supported formats include the Common Log Format, CUPS page_log, syslog, strace, and generic timestamped messages. Can perform SQL queries.

Configuration files

/etc/hosts

Name and link	Description
hosts	Add and remove entires in `/etc/hosts`. Change a hostname's IP address. Idempotent. Preserves arbitrary comments. Can be used as a Tcl library.
hostess	Add and remove entires in `/etc/hosts`. Disable (comment out) and enable (uncomment) entires. Check if a hostname exists. Reformat the hosts file. Convert the entries to JSON. Idempotent. Removes arbitrary comments.

INI

Name and link	Platform	License	Description
cfget	Any with Python 2.x?	GNU GPLv2+	Retrieve properties as shell script commands to set the corresponding variables (with `--dump exports`). Retrieve properties' values as plain text. Substitute values from an INI file in an Autoconf-style template. Supports plug-ins. Chokes on section names and keys with spaces.
confget	Linux, FreeBSD	Two-clause BSD	Retrieve properties and sections as shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Check for existence of properties. List sections. Find values that match a pattern. Read-only.
crudini	Any with Python 2.x	GNU GPLv2	Retrieve properties and sections as INI fragments or shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Set properties. Remove properties and sections. Create empty sections. Merge INI files. Changes files in place.
inicomp	Windows, *nix	Apache 2.0	Compare INI (and also Windows .reg) files.
IniFile (DOS version)	Windows (x86, x86-64), MS-DOS	Closed-source freeware	Retrieve properties and sections as batch file commands to set the corresponding variables. Set properties. Remove properties and sections. Changes files in place.
initool	Linux, FreeBSD, Windows	MIT	Retrieve properties and sections as INI fragments. Retrieve properties' values as plain text. Set properties. Check for existence of properties and sections. Remove properties and sections. Outputs the updated INI file.

Multiple formats

Name and link	Description
Augeas	Query and modify a number of file formats. Not all of the formats are equally well supported by Augeas and for some only a limited subset of all valid files can be parsed.
Elektra	Query and modify configuration files. Shares Augeas' limitations when it comes to application-specific configuration files (it uses the same lenses), but has better support for generic formats such as JSON and INI.

Templating for structured text

Listed below are restricted programming language interpreters and templating tools that produce structured text output. They are generally intended to remove repetition in configuration files. They are distinct from unstructed templating tools like the jinja2 CLI program, which should not be added to this table.

Name and link	Output format	Turing-complete?	Syntax	I/O	Description
CUE	JSON	Yes?	Extended JSON	?	A constraint language for JSON configuration data. Can generate and validates JSON.
Dhall	JSON, YAML	No	Haskell-inspired	Limited to importing libraries from files and HTTP(S) URLs (with protection against leaking your data to the server)	A statically-typed functional configuration language. Has a standard formatting tool.
jk	JSON, YAML, plain text	Yes	JavaScript	Disk I/O	Generate configuration files using JavaScript (V8 VM).
Jsonnet	JSON, INI, XML, YAML, plain text	Yes	Extended JSON	None	A functional configuration language. Has a standard formatting tool.
rjsone	JSON, YAML	No?	Extended JSON	None	A CLI tool for the JSON-e templating language.
ytt	YAML	No	YAML/Python hybrid	None?	A templating tool for YAML built upon the Starlark configuration language.

Bonus round: CLIs for single-file databases

Name and link	Description	File format
Firebird	Firebird is a FOSS database that can be used from a single file, like SQLite. "isql is a program that allows the user to issue arbitrary SQL commands".	Binary
Fsdb	A flat-file database for shell scripting.	Text-based, TSV with a header or "key: value"
GNU Recutils	"[A] set of tools and libraries to access human-editable, plain text databases called recfiles."	Text-based, roughly "key: value"
SDB	"[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection."	Binary
sqlite3(1)	"[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database."	Binary

License

The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.

Disclosure

csv2html, hosts, Sqawk, jsonwatch, Remarshal and initool are developed by the curator of this document.

May	JUN	Jul
	06
2019	2020	2021

Name	Latest commit message	Commit time
Failed to load latest commit information.
.gitattributes	SQL-based: Generate detailed cmp table from CSV	Apr 10, 2018
Makefile	Makefile: update arguments to tidy(1)	Feb 1, 2020
README.md	YAML: add gojq	Jun 6, 2020
sql-based.csv	RBQL: update CSV	Feb 1, 2020
sql-based.md	SQL-based: reindent HTML	Feb 1, 2020

dbohdan / structured-text-tools

README.md

Structured text tools

Contents

DSV

Awk

POSIX commands

Other tools

SQL-based tools

XML, HTML

JSON

YAML, TOML

Log files

Configuration files

/etc/hosts

INI

Multiple formats

Templating for structured text

Bonus round: CLIs for single-file databases

License

Disclosure

dbohdan / structured-text-tools

Join GitHub today

Clone with HTTPS

Downloading

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Files

README.md

Structured text tools

Contents

DSV

Awk

POSIX commands

Other tools

SQL-based tools

XML, HTML

JSON

YAML, TOML

Log files

Configuration files

/etc/hosts

INI

Multiple formats

Templating for structured text

Bonus round: CLIs for single-file databases

License

Disclosure