niche Posted November 18, 2010 Share Posted November 18, 2010 I need to add a blog to a site and am thinking about how it can be abused. After stripping slashes, what are some other issues I need to think about? Link to comment Share on other sites More sharing options...
End User Posted November 18, 2010 Share Posted November 18, 2010 I need to add a blog to a site and am thinking about how it can be abused. After stripping slashes, what are some other issues I need to think about?You need to sanitize your inputs rigorously. There's a lot more to it than just stripping slashes.http://psoug.org/blogs/mike/2010/04/11/little-bobby-tables/Sanitizing HTML Link to comment Share on other sites More sharing options...
thescientist Posted November 18, 2010 Share Posted November 18, 2010 mysql_real_string_escape Link to comment Share on other sites More sharing options...
jeffman Posted November 18, 2010 Share Posted November 18, 2010 Consider this. Many blogs allow users to post text with HTML tags. That way they can add bold and italics and even lists and such. So you might be tempted to let that through. Now, what happens if a user posts a small javascript like this: <script type="text/javascript"> while (1) alert ("Gotcha!");<script> Left alone, this script will begin executing as soon as the page starts downloading, and the effect will be an unstoppable series of alerts. Your users will hate you, and this is just a mild instance of the havoc a malicious user can cause. Link to comment Share on other sites More sharing options...
niche Posted November 18, 2010 Author Share Posted November 18, 2010 These are very helpful posts.Though I didn't find anything definitive on mysql_real_string_escape. What is it? Link to comment Share on other sites More sharing options...
iwato Posted November 18, 2010 Share Posted November 18, 2010 Though I didn't find anything definitive on mysql_real_string_escape. What is it?Try mysql_real_escape_string.Roddy Link to comment Share on other sites More sharing options...
jeffman Posted November 18, 2010 Share Posted November 18, 2010 Pay special attention to the magic_quotes_gpc problem. Always test for it before using mysql_real_string_escape, or you will end up with extra slashes in your post. Do this even if your current server does not have magic_quotes_gpc enabled. If you ever move your code to a different server, magic_quotes_gpc might be enabled, and your results could get seriously messed up.I learned that lesson the hard way. Link to comment Share on other sites More sharing options...
boen_robot Posted November 18, 2010 Share Posted November 18, 2010 As a generic rule, think about the context in which the data will be used, and escape accordingly, and do not trust any input, even if comes from "reliable" 3rd parties.For example:1. Use mysql_real_escape_string() when the data is to be a string in an SQL query.2. Use htmlspecialchars() when the data is to be part of (X)HTML output AND you want it to appear "as is".3. Use regular expressions for other contexts in which there is no more appropriate function.If you want to allow BBCode or other special codes that should then be outputted as (X)HTML, you're entering a whole new world of pain... using the BBCode extension is best for BBCode, but your host needs to install it, like any other extension.If you want to let users type XHTML, and have it be outputted safely (i.e. with no JavaScript), it's best if you use a custom XHTML schema or DTD that is a subset of the actual XHTML schema or DTD. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 18, 2010 Share Posted November 18, 2010 If you want to allow BBCode or other special codes that should then be outputted as (X)HTML, you're entering a whole new world of pain... using the BBCode extension is best for BBCode, but your host needs to install it, like any other extension.I was in this situation once, and I contacted numerous web hosts, and only a small percentage said they could have it installed. The rest just rigidly stuck with the default extensions. I ended up writing my own parser, but you might want to look at this. Link to comment Share on other sites More sharing options...
niche Posted November 18, 2010 Author Share Posted November 18, 2010 What do you think of a script like this? It's designed to only allow a narrowly defined range of characters (a-z, A-Z, hyphen, and space). $len = strlen($text);$counter = 0;$text2 = ''";while($counter < $len) { $char = substr($text,$counter,1); if ((ord($char) > 64 and ord($char) < 123) or (ord($char) == 45) or (ord($char) == 32)) { $text2 = 'good'; $counter = $counter + 1; } else { $text2 = 'bad'; break; }}echo $text2; Link to comment Share on other sites More sharing options...
Dilated Posted November 18, 2010 Share Posted November 18, 2010 What do you think of a script like this? It's designed to only allow a narrowly defined range of characters (a-z, A-Z, hyphen, and space).You would be better off using a regex. For example, the code you posted above could be reduced down to:$txt = preg_replace('/[^\w\s\-]+/', '', $txt); Link to comment Share on other sites More sharing options...
jeffman Posted November 18, 2010 Share Posted November 18, 2010 We're just validating, right? So preg_match instead of preg_replace? Also:\w matches the underscore, which niche did not specify.\s matches newlines and tabs as well as spaces (users can paste tabs)\d works fine, but I like matched sets :)so maybe: if (preg_match('/[^a-zA-Z0-9 -]/', $txt) ) { // $txt contains an illegal character} To give you credit, niche, you did a decent job with the tools you knew. Start learning regex. It seems mystical at first, but when you get the hang of it, you'll wonder how you lived without it. (In fact, a good homework assignment is to figure out what the expressions Dilated and I posted actually DO, and how they are different. Link to comment Share on other sites More sharing options...
niche Posted November 18, 2010 Author Share Posted November 18, 2010 regex as in http://www.regular-expressions.info, or do you have a different preferred site? Link to comment Share on other sites More sharing options...
Dilated Posted November 18, 2010 Share Posted November 18, 2010 regex as in http://www.regular-expressions.info, or do you have a different preferred site?I've never used the site you posted, but the best references I've found for (perl) regular expressions are the PHP pcre documentation and the perl documentation Link to comment Share on other sites More sharing options...
niche Posted November 18, 2010 Author Share Posted November 18, 2010 Thanks. I read the intros, but didn't get a feel to where regular expressions fit-in. In the way that I know how HTML, CSS, JS (which I don't know nearly enough about, but I know how it fits), and PHP/mysql fit together.What's your point of view about how regular expressions fit in? Right now I'd say that it's short hand for finding patterns in data (possibly some kind of oop). Link to comment Share on other sites More sharing options...
justsomeguy Posted November 19, 2010 Share Posted November 19, 2010 That's what they are, they are a way to search for patterns inside text instead of specific things. You can use a function like strpos to search for specific strings inside text, but if you need to find all HTML tags, or anything else that fits a general pattern instead of a specific string, then regular expressions are what you use to do that. Javascript also has support for using regular expressions. Link to comment Share on other sites More sharing options...
Dilated Posted November 19, 2010 Share Posted November 19, 2010 Thanks. I read the intros, but didn't get a feel to where regular expressions fit-in. In the way that I know how HTML, CSS, JS (which I don't know nearly enough about, but I know how it fits), and PHP/mysql fit together.What's your point of view about how regular expressions fit in? Right now I'd say that it's short hand for finding patterns in data (possibly some kind of oop).It's a way of looking for patterns in strings. It's hard to learn at first but if you keep at it, you will thank yourself in the end. Perl regular expressions are universally used in a lot of languages, so it will be one of the most useful things you learn.I can try to explain to you why the regular expression I posted above works.[^\w\s\-]+ That is the pattern itself. Every pattern will be surrounded by delimiters, usually slashes (/), to mark where the pattern begins/end. After the last delimiter, certain options can be specified, but this isn't relevant to the above pattern which is why I removed them.The \w "escape sequence", as its called, represents any "word" character. A word character is just what it sounds like, any character you might find in a word - any alphanumeric character plus the underscore. This would be equivalent to: [a-Z0-9_]The \s escape sequence represents any space character. A space character can be a regular whitespace, a (horizontal) tab, a vertical tab, a newline, and I think a return carriage.Now I should explain what the square brackets mean. If square brackets enclose characters, it's called a set. A set will match a single character based on what it contains. For example, the set [abc] would match characters "a", "b", or "c", but not "d" or any other letter. So the string, "cat" would be matched, as well as the string "apple", or "ball", because they all contain at least one of the letters in that set. If you were to do preg_replace('/[abc]/', 'x', 'cat'), in PHP, it would return "xxt", because the set would match the "a" and "c" (or "b" if its there) and replace it with "x".When a caret (^) is placed at the beginning of a set, it negates the set. So if you were to use [^abc] as your pattern, it would match any character that is not "a", "b", or "c". So, if we run the function again: "preg_replace('/[^abc]/', 'x', 'cat');". PHP would return "cax", because it found a character, "t", that was not an "a", "b", or "c".Now, the plus sign. The plus sign tells the pattern to look for one or more of the preceding characters. So if I did: "preg_replace('/a+/', 'c', 'aaabbb');", it would return "cccbbb", because it would replace all consecutive a's with c's. With the pattern I originally posted above, the plus sign isn't even necessary, but it's better because the pattern doesn't have to be matched as much.So now I can tell you that "[^\w\s\-]" matches any character that is not a "word" character or a space character or a hyphen/dash. (It's good to put a backslash before the hyphen in a set because it can be treated as a special character inside a set, e.g. "a-Z0-9").This is only one small area of regular expressions though. I encourage you to look into it more. Link to comment Share on other sites More sharing options...
niche Posted November 19, 2010 Author Share Posted November 19, 2010 Will I need to install Perl? if so, is there a free version? Link to comment Share on other sites More sharing options...
thescientist Posted November 19, 2010 Share Posted November 19, 2010 It's a way of looking for patterns in strings. It's hard to learn at first but if you keep at it, you will thank yourself in the end. Perl regular expressions are universally used in a lot of languages, so it will be one of the most useful things you learn.I can try to explain to you why the regular expression I posted above works.[^\w\s\-]+ That is the pattern itself. Every pattern will be surrounded by delimiters, usually slashes (/), to mark where the pattern begins/end. After the last delimiter, certain options can be specified, but this isn't relevant to the above pattern which is why I removed them.The \w "escape sequence", as its called, represents any "word" character. A word character is just what it sounds like, any character you might find in a word - any alphanumeric character plus the underscore. This would be equivalent to: [a-Z0-9_]The \s escape sequence represents any space character. A space character can be a regular whitespace, a (horizontal) tab, a vertical tab, a newline, and I think a return carriage.Now I should explain what the square brackets mean. If square brackets enclose characters, it's called a set. A set will match a single character based on what it contains. For example, the set [abc] would match characters "a", "b", or "c", but not "d" or any other letter. So the string, "cat" would be matched, as well as the string "apple", or "ball", because they all contain at least one of the letters in that set. If you were to do preg_replace('/[abc]/', 'x', 'cat'), in PHP, it would return "xxt", because the set would match the "a" and "c" (or "b" if its there) and replace it with "x".When a caret (^) is placed at the beginning of a set, it negates the set. So if you were to use [^abc] as your pattern, it would match any character that is not "a", "b", or "c". So, if we run the function again: "preg_replace('/[^abc]/', 'x', 'cat');". PHP would return "cax", because it found a character, "t", that was not an "a", "b", or "c".Now, the plus sign. The plus sign tells the pattern to look for one or more of the preceding characters. So if I did: "preg_replace('/a+/', 'c', 'aaabbb');", it would return "cccbbb", because it would replace all consecutive a's with c's. With the pattern I originally posted above, the plus sign isn't even necessary, but it's better because the pattern doesn't have to be matched as much.So now I can tell you that "[^\w\s\-]" matches any character that is not a "word" character or a space character or a hyphen/dash. (It's good to put a backslash before the hyphen in a set because it can be treated as a special character inside a set, e.g. "a-Z0-9").This is only one small area of regular expressions though. I encourage you to look into it more. that was an excellent post, thanks for the contribution. Link to comment Share on other sites More sharing options...
thescientist Posted November 19, 2010 Share Posted November 19, 2010 Will I need to install Perl? if so, is there a free version?You can google regex and read up about it. They are native now to Javascript and PHP.http://www.w3schools.com/jsref/jsref_obj_regexp.asphttp://php.net/manual/en/book.regex.phpedit: if you think about them in the context of how search engines work for sites like ebay, amazon, and google, you'll see why they are very useful. Link to comment Share on other sites More sharing options...
iwato Posted November 19, 2010 Share Posted November 19, 2010 Will I need to install Perl? if so, is there a free version?The answer to your question is here. In effect, PCRE is built-in. Once way to discover whether you have got it or not is to make a search for the constants that make it work. If the following code produces an array of pre-defined constants, then you have probably got it.<?php function returnConstants($prefix) { foreach (get_defined_constants() as $key=>$value) { if (substr($key,0,strlen($prefix))==$prefix) { $dump[$key] = $value; } } if(empty($dump)) { return print "Error: No Constants found with prefix '". $prefix . "'"; } else { return print_r($dump); } } returnConstants('PREG_');?> Finally spend a couple of days exploring the sections associated with the PCRE reference that thescientist, dilated, and I have given you, and you will be glad you did.Roddy Link to comment Share on other sites More sharing options...
jeffman Posted November 19, 2010 Share Posted November 19, 2010 You do not need to install Perl (though most installations have it anyway by default).There are different variations on regular expressions. Most PHP regex functions understand Perl Compatible Regular Expressions. It's built in. You do not need Perl to be installed or even to know anything about Perl. Older PHP functions recognize other variations of regular expressions, but they are all deprecated now. PCRE are the most widely used regular expressions in web development. PHP and JavaScript understand them, and that's all you really need to know. Link to comment Share on other sites More sharing options...
niche Posted November 19, 2010 Author Share Posted November 19, 2010 wow! This has been a truely productive topic for me and for others. As usual the w3schools team came through. I need to thank End User, thescientist, Deirdre's Dad, iwato , boen_robot , Fmdpa , Dilated, and justsomeguy.I'm excited to learn about regular expressions. I didn't know they existed before today. As I've said in other topics, I spent over twenty years writing programs in a terminal environment to clean and match data so it could be meaningfully compiled. To find out that some of that work is already done is a very big deal to me.For me this is another great example of how forums at w3schools routinely tell me what I didn't know, that I didn't know. To think, I'm finishing today much better informed than I started it.Thanks again for everyone's help.Niche Link to comment Share on other sites More sharing options...
chokk Posted November 19, 2010 Share Posted November 19, 2010 You might also want to look into prepared statements. These bad-boys seperate SQL logic from supplied data, effectively putting a stop to injection attacks.Read more Link to comment Share on other sites More sharing options...
niche Posted November 19, 2010 Author Share Posted November 19, 2010 Thanks chokk. It's a good thing that the weekend is around the corner. Looks like I'll have to spend at least a little time on prepared statement too. EDIT: Scrapbook Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.