Jump to content

Using preg_replace for selective sanitization


Fmdpa

Recommended Posts

I just completed the BBcode parser that I was working on. Everything parses just fine. But then I have to sanitize it. my sanitization looked something like this:

$clean = convert_chars($con->real_escape_string(htmlspecialchars($_POST['text'])));

The problem with this is that the quotes are escaped before the text is parsed by my function. And since my parser does not match escaped quotes, the text won't parse at all. I can't figure out the regex for matching an escaped quote, either, otherwise I would use that in the parser. If I use /[//"//']/, it takes that as "match a backslash, a double quote, a backslash, or a single quote. I want it to be "match a backslash-escaped double-quote, or a backslash-escaped single-quote". Another solution I was pondering was this:

$clean = $con->real_escape_string(convert_chars(htmlspecialchars($_POST['text'])));

This does parse correctly, but any tag that has an attribute value that is surrounded by quotes is useless. Take this anchor tag for example. <a href="#"></a>will become<a href=\"#\"></a>,and consequently the link will be useless. I then considered doing something like this:

$clean = $con->real_escape_string(convert_chars(htmlspecialchars($_POST['text'])));$clean = preg_replace('/\<([^\/\>])+\>/e', "stripslashes('\$1')", $clean);

...but that didn't work right either. I've done everything I can think of, and I'm about to let it out on this keyboard and write one looong "regular expression [of my frustration]"! @#$%"{>@#{>132 ;4.2[]{?.';3^(.;@.;1'#.';@:4<?[_+%=2!`~*<:[ai5#)!^-%<?";&/

Link to comment
Share on other sites

How about a pattern like this for the quotes:

#(\")|(\')#

You don't need to use "/" as a delimiter, the delimiter can be any character. If it's in a string in PHP, just escape the appropriate quote:

$pattern = '#(\")|(\\\')#';$pattern = "#(\\\")|(\')#";

You can echo out $pattern to make sure it is what you want it to be.Other than that, if this data is going into a database, don't guess about where you should put real_escape_string. It always goes last. You want to transform the data to be whatever you want to store in the database first, and then you escape that data to actually put it in the database. It's not going to modify the data, it's just going to make sure the query doesn't fail.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...