The following script should filter out all ANSI/VT100/xterm control sequences for (based on ctlseqs). Minimally tested, please report any under- or over-match.
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
s/ \e[ #%()*+\-.\/]. |
\e\[ [ -?]* [@-~] | # CSI ... Cmd
\e\] .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
\e[P^_] .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
\e. //xg;
print;
}
Known issues:
- Doesn't complain about malformed sequences. That's not what this script is for.
- Multi-line string arguments to DCS/PM/APC/OSC are not supported.
- Bytes in the range 128–159 may be parsed as control characters, though this is rarely used. Here's a version which parses non-ASCII control characters (this will mangle non-ASCII text in some encodings including UTF-8).
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
s/ \e[ #%()*+\-.\/]. |
(?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
(?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
(?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
\e.|[\x80-\x9f] //xg;
print;
}