Traditional WAFs typically use regular expressions to define attack patterns. Taking the well-known ModSecurity engine as an example, 80% of WAFs in the world are powered by it. Let's analyze what his rules are like.
union[\w\s]?select
: This rule defined an SQL injection attack pattern while the traffic contains the words "union" and "select".\balert\s(
:This rule defined an XSS attack pattern while the traffic contains the - word "alert" followed by a left parenthesis "(".Real attackers they can easily bypass these keywords, thus circumventing the protection of the WAF. Using the rules mentioned above, let's look at some examples of false negatives:
union /**/ select
: By inserting a comment character between "union" and "select," the keyword pattern is disrupted, making the attack undetectable.window'\x61lert'
: By replacing the letter "a" with "\x61," the keyword pattern is disrupted, making the attack undetectable.From these examples, we can conclude that traditional regex-based WAFs cannot effectively prevent attacks as they can always be bypassed by hackers.
Furthermore, regular expressions also cause a high rate of false positives, resulting in genuine website users being affected. Let's look at some examples of false positives:
The union select members from each department to form a committee
: It triggers the above-mentioned rule and gets mistakenly identified as an SQL injection attack, while it is just a simple English sentence.Her down on the alert(for the man) and walked into a world of rivers
: It triggers the above-mentioned rule and gets mistakenly identified as an XSS attack, while it is just a simple English sentence.Here, we share two readings to see how the masters from the Black Hat conference automate bypassing regex-based WAF protections:
Syntax analysis algorithm is the core capability of SafeLine WAF. Instead of using simple regex patterns to match the attack traffic, it truly understands the user inputs in the traffic and deeply analyzes potential attack behaviors.
Taking SQL injection as an example, attackers need to meet two conditions to successfully carry out SQL injection attacks:
union select xxx from xxx where
is a syntactically valid SQL statement fragment.union select xxx from xxx xxx xxx xxx xxx where
is not a syntactically valid SQL statement fragment.1 + 1 = 2
is a syntactically valid SQL statement fragment.1 + 1 is 2
is not a syntactically valid SQL statement fragment.union select xxx from xxx where
has the potential for malicious behavior.1 + 1 = 2
has no practical meaning.SafeLine WAF conducts attack detection based on the essence of SQL injection attacks, following a process similar to the one below:
SafeLine WAF has built-in compilers covering common programming languages. By deeply decoding the payload content of HTTP, it matches the corresponding syntax compiler based on the language type and then matches the threat model to obtain the threat rating, allowing or blocking access requests.
Students majoring in computer science have studied compiler principles, where Chomsky's grammar system is mentioned. He divides formal languages in the computer world into four types:
The expressive power of these four grammars weakens from level 0 to level 3. The programming languages we commonly use, such as SQL, HTML, and JavaScript, are usually Type 2 grammars (even including some elements of Type 1 grammars). On the other hand, regular expressions correspond to the weakest expressive power of Type 3 grammars.
To what extent is the expressive power of regular expressions weak? A classic example is that regular expressions cannot count. You cannot even use a regular expression to recognize a valid string of matched parentheses.
Using the weak expressive power of Type 3 grammars to match dynamically changing attack payloads is impossible. The reason lies in the inherent limitations of rule-based attack recognition methods. From a comparison of grammar expressive power, Type 3 grammars are included within Type 2 grammars. Rule-based descriptions based on regular expressions cannot fully cover attack payloads based on programming languages. This is the fundamental reason why rule-based attack recognition in WAFs has lower protection effectiveness than expected.
Therefore, compared to regex-based pattern matching threat detection methods, syntax analysis has the characteristics of high accuracy and low false positive rate.
Finally, I recommend you to try https://github.com/chaitin/SafeLine