Skip to main content
1 of 5
l0b0
  • 11.6k
  • 2
  • 45
  • 49

Any implementation trying to detect "malicious characters" is flawed, when you look at the combined properties of such an implementation:

  • A "valid" subset of a character set is not so easy to define. Newline is a control character, and you definitely want to allow that in comments. You'd have work cut out for days or weeks to create a sensible subset of Unicode (and combinations of characters) which could be considered "valid" across the globe.
  • The set is valid only for a single version of a single character set, so it's not future-proof.
  • You still need to test the full character range to see if there are any security holes.
  • If you're not careful with what you accept, you'll end up annoying users, and they will leave. If you're lucky, one in a thousand will file a bug report.

I'd go so far as to say that validating allowed characters reduces security, because it encourages sloppy implementation (lack of testing/escaping). If you escape where necessary you can instead just test the "nasty" characters, and if they work, you have pretty much guaranteed that other nasty characters will also be harmless to the system.

All this is of course not to say that some characters are nonsensical in some fields, such as two in a numeric field.

l0b0
  • 11.6k
  • 2
  • 45
  • 49