Regular Expression Matching Can Be Simple And Fast

With Peter's observation that everything good in Computer Science happened during the "Golden Age" freshly in mind, I found Russ Cox's recent article on regular expressions to be enjoyable reading.

This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics ... The trends shown in the graph continue: the Thompson NFA handles a 100-character string in under 200 microseconds, while Perl would require over 10^15 years.

Combining implementation details, finite automata, and a foray into decades-old theory, this article shows how most of our favorite little languages have an enormous performance bottlenecks for certain categories of string comparisons.

An additional data point: The Shootout benchmarks have a large string comparison test. It's interesting that Tcl is at the top of the heap for performance. Guess which one is using the Thompson NFA algorithm for regular expressions?