/((([\w -]+)|("[\w -]+"))( *, *)?)+/
I'm trying to use a PHP regex to sanitize a user input for a list of fonts. The above one seems to work nicely, but also feels a bit long and redundant, having two [\w -]*s, and allowing a trailing comma, and possibly stuff I can't see. Can this be more efficient (smaller, less redundant, more secure)?
Examples
The regex should match these:
Roboto "Roboto" "Roboto Condensed" Roboto Condensed Roboto Condensed, Roboto "Roboto Condensed", Roboto Roboto Condensed, "Roboto" "Roboto Condensed", "Roboto", sans-serif
or any string that matches the W3C specification for a CSS font-family property
It must be able to match these, but not any preceding (e.g. /*) or succeeding (e.g. ; DROP TABLE foo) string that might cause errors and open up exploits. So, it should match the font-family list (bold) in the following example, but not the surrounding potentially malicious characters.
/*"Roboto Slab", "Helvetica Neue", "Arial", sans-serif; DROP TABLE foo --
Similarly, I want it to also remove strings that someone might not know would produce broken code:
font-family:"Roboto Slab", "Helvetica Neue", "Arial", sans-serif; /* My fonts */
My input-handling function:
$userFontName = sanitizeFontName($_GET['font'])
function sanitizeFontName($fontName)
{
$match = null;
preg_match('/((([\w -]+)|("[\w -]+"))( *, *)?)+/', $fontName, $match); // Only get a W3C font list. Proven here: http://refiddle.com/18ql
return $match[0]; // only one
}