Jump to content

user input sanitizing


taltal

Recommended Posts

hello alli have written a chat app for my siteand now i am facing a problemwhile i wish to allow all characters to be written, as it is a free text chathow do i sanitize user input- for saving in mysql - for presenting to the other person in the chatwhat is the best solution for doing so ?all ideas are welcome i do also have an idea, and would welcome any thoughts about it (is it good or not)- save user input to mysql with stored procedure (in utf-8 encoding - no html or other encoding)- present user input special characters (<'>"...) as images instead of text, and all other characters a-z0-9 as textthank you for your help and timeTal

Link to comment
Share on other sites

I sanitize user input by using PHP's htmlspecialchars() method and mysqli_real_escape_string(), stacked like so:$message = htmlspecialchars(mysqli_real_escape_string($conn, $_POST["message"]));When I display $message or fetch it from a database, tags appear as harmless symbols and text onscreen.You can try it here (it's ugly and clunky, but it was just a learning experience).

Link to comment
Share on other sites

You can turn the "special" characters into entities, so that they are displayed verbatim - no images necessary. For query sanitization, there is always mysql_real_escape_string(). I don't see why you would want to use stored procedures.Edit: it is better to call htmlspecialchars() when the information is retrieved. That way the data in the database is more useful.

Link to comment
Share on other sites

If you're writing a customer-facing app, you really need to do some serious sanitizing, and not just for HTML tags. In addition to handling HTML tags, you should account for malicious exploit strings like 'passthru', 'alert', 'eval', 'cmd', 'exec', 'system', 'fopen', and others. You'll want to clean up quotes and backticks while you're at it. Then clean or sanitize Javascript-specific exploits like 'onblur', 'onchange', 'onclick', 'onfocus', 'onkeypress', etc etc. There are about 50 or so of these commands to check for.Then clean up potentially obfuscated strings like 'j a v a s c r i p t', 'a l e r t', and 'alert', where spaces or carriage returns have been inserted to try and fool or bypass sanitizer code. You're not done yet, though. Next you'll want to look for other encoded strings, i.e. %3Ca%20href%3D%22http%3A%2F%2Fw3schools.com%2Findex.php%22%3EW3Schools%3C%2Fa%3ELook for octal encoded stuff, like \x00-\x08, \x0b-\x0c, and \x0e-\x20 Convert tabs to spaces, to avoid stuff like 'a lert'.Also look for B64 encoded strings like http://%77%77%77%2E%67%6F%6F%67%6C%65%2E%63%6F%6D">Google</a>. Keep in mind that obfuscating and encoding can be done using UTF16 data, null characters, octal characters, or all of the above. Return characters may also be mixed in try and fool a sanitizer function. Some of these can be mitigated by using htmlspecialchars() and mysql_real_escape_string(), but it should be obvious that there's a lot of stuff that htmlspecialchars() and mysql_real_escape_string() won't catch. Urldecode isn't a solution because it removes "+" signs, which you may need to keep in the data. There's more, but my fingers are tired, lol. Think this is overkill? Not hardly. Bots and hackers use these techniques to break into sites all the time. Before I started to sanitize my input data this thoroughly, I had a couple of sites compromised by these exact techniques, apparently by bots trying a library of exploits like the ones above. Eventually, one or more of them succeeded, and they cracked the server.

Link to comment
Share on other sites

thanks for your answers chibineku and Synooki have done like this so farwriting to mysql i only use mysql_real_escape_string() on user input, before INSERTing to mysqlreading from mysql for presentation, to another user i use only1 -htmlspecialchars()2 -stripcslashes()3 -nl2br()in that orderso far it looks goodthe use of stored procedure came in order to be on the safe side of sql injectionsas i understand it, this way user input is treated as simple text and not sql codedid i get it right ?End User - you write, there is much more to dobut i cannot find an example that passes what i have described above excuse me if i am a bit ignorant in this matter,but this is way i came to ask in the first place i tried "%3Ch1%3Ehy%3C/h1%3E" which is the same as "<h1>hy</h1>"but it is written to screen in its text form - meaning it doesn't display a BIG hyi would thank you very much if you can give an examplethat will pass the sanitizing described above and what do you think of my idea, presenting all special characters as images ?thank youTal

Link to comment
Share on other sites

i would thank you very much if you can give an examplethat will pass the sanitizing described above
Sure, just go to http://ha.ckers.org/xss.html to see some simple XSS examples. There's lots more on other parts of the web: http://www.tenouk.com/Bufferoverflowc/Bufferoverflow6.htmlhttp://en.wikipedia.org/wiki/Code_injectionhttp://www.phpclasses.org/blog/post/67-PHP...GIF-images.htmlExploit examples from Google...and thousands more where those came from.
and what do you think of my idea, presenting all special characters as images ?
I don't think that's a particularly viable solution.
Link to comment
Share on other sites

writing to mysql i only use mysql_real_escape_string() on user input, before INSERTing to mysqlthe use of stored procedure came in order to be on the safe side of sql injectionsas i understand it, this way user input is treated as simple text and not sql codedid i get it right ?
You should be using one thing or the other... so which is it again? mysql_real_escape_string() on inserting, and stored procedures on everything else? Why? Seems illogical.Or is the question if stored procedures are better than mysql_real_escape_string()? I'm not sure how they operate under the hood, but I'd assume stored procedures are more secure than mysql_real_escape_string(), exactly because, like you say, they process the input and the query separately. Still, if used properly, mysql_real_escape_string() is sufficient.
and what do you think of my idea, presenting all special characters as images ?
Very non efficient. It's simply not worth it. Not to mention that it throws accessibility off the window... while you're at it, you might as well turn your whole site into big image.
reading from mysql for presentation, to another user i use only1 -htmlspecialchars()2 -stripcslashes()3 -nl2br()in that order
What does "in that order" supposed to mean? From the inside out, from the outside in, from top to bottom (one function per variable assignment)? More importantly, why stripslashes() anyway? Seems redunant to me.From what I see at the ha.ckers.org examples, it appears only pages that let users write raw HTML are vulnerable to those attacks - by using htmlspecialchars(), you aren't. But if you let users write HTML, or even BBCode (which would in turn be transformed to HTML), you need to ensure users can't write out these things directly or (in the case of BBCode) indirectly.As for the other examples... End User... you are (yet again) confusing contexts. The samples you gave are applicable for other kinds of attacks, applicable in other situations. It's PHP's responsibility to deal with buffer overflows, and for this, staying current is the only advise that could be given to a person using PHP. Code injection in SQL context is what mysql_real_escape_string() and stored procedures are for, code injection in HTML context is what htmlspecialchars() is for, and that other article is applicable in case you allow file uploads (and btw, I have yet to see someone stupid enough to let users upload files that don't have certain extensions... on the very least, there are always a few blacklisted extensions).
Link to comment
Share on other sites

It would help to firstly get familiarized with most if not all of the attacks (XSS, CSRF and SQLi are the very basics), and then start researching on what you could do to prevent them.End User's ha.ckers.org link is a very good place to research on XSS, while HTML Purifier passes all the tests in there.

Link to comment
Share on other sites

It might be easier to limit the characters to a very basic set - such as only letters, digits and a few punctuation characters, instead of evaluating the strings.For example:[a-zA-Z\.\,\;\?]As you do the research suggested, you'll see that some characters are required for attacks, and removing them should help. = < > " ` (backtick) # % & are potential hazards.

Link to comment
Share on other sites

As for the other examples... End User... you are (yet again) confusing contexts. The samples you gave are applicable for other kinds of attacks, applicable in other situations. It's PHP's responsibility to deal with buffer overflows, and for this, staying current is the only advise that could be given to a person using PHP.
No, I'm not confused. I'm sorry if you didn't understand the scope of what I was getting at; perhaps I should have explained a bit more thoroughly. I was giving him examples of things that can/could constitute a multi-level attack. For example, not everyone can upgrade to the latest version of PHP, in fact the majority of hosts don't keep current. So, if there's a buffer overflow vulnerability in your particular version of a PHP function or module and someone can successfully call the code to exploit it, congratulations, they own your server. In that "context", relying on a host to be up to date with their version of PHP to protect yourself isn't a smart thing to do. It's better to prevent the potential attack code from getting through in the first place.
Code injection in SQL context is what mysql_real_escape_string() and stored procedures are for, code injection in HTML context is what htmlspecialchars() is for, and that other article is applicable in case you allow file uploads (and btw, I have yet to see someone stupid enough to let users upload files that don't have certain extensions... on the very least, there are always a few blacklisted extensions).
It's not a matter of stupidity, it's a matter of not being able to foresee every possible future circumstance. Frankly, I don't think you can do that. For example, The SMF message board (a pretty well-written app) fell victim to a clever upload exploit that made use of improperly checking and sanitizing a filename. It allowed executable code hidden in an image to be run. They didn't catch that, and you probably wouldn't have either. No one predicted that, but someone managed to slip it by the upload checking code. There are a billion similar examples. A fair number of exploits rely on "staging up" or privilege escalation. Some of them make use of bugs in various versions of PHP for example, and that's what I was showing by the URLs I listed. The suff on ha.ckers.org is child's play, there are considerably more sophisticated tricks out there and a combination of htmlspecialchars() and mysql_real_escape_string() don't come close to stopping them all. In short, you can't barricade the front door while leaving the back door unlocked. Security has to be end-to-end and take into account a reasonable set of threats...not just the ones that are easy to guard against.
Link to comment
Share on other sites

The question was about sanitizing user inputs in regards to everything (s)he's doing up until now. This includes accepting text, dealing with MySQL and text display. Assuming the chat required a log in, it would also include dealing with sessions. It surely doesn't (at least not yet) include file uploads and buffer overflows, and because HTML or BBCode is not yet in the mix, it doesn't really include any XSS protection beyond htmlspecialchars().Talking about security in general is fine, but... do you think anyone will understand when they don't have the circumstances to be vulnerable to these issues? Why should I know about possible email attacks like header injection when I don't deal with emails? Why should I know about file upload security issues if I don't want to do any file uploads? Why should I know about path injection if none of my file references depend on user input? Yes, the answer is obvious after you know those things, but from the perspective of a person that hasn't encountered the feature yet, all extra things just slip through.As for buffer overflows... I don't really know, so I must ask in case you do... do you have any reccomendations on how a PHP programmer might solve a buffer overflow problem? AFAIK, because PHP first processes user input, and only then presents it to you in the superglobals, if PHP is vulnerable to such an attack, you won't be able to work around it. Checking the input length or filtering it out doesn't help if the attack has occured before you were able to do the check. That's why I say that staying current is the only way to protect yourself against such attacks. If a host knows about this and still doesn't upgrade, one should probably find a new host.

Link to comment
Share on other sites

do you have any reccomendations on how a PHP programmer might solve a buffer overflow problem?
Truncate user input to a reasonable limit.Limit users to 256 characters or less on a chat post. Anyone that's putting that much text in a chat should be squelched anyway. :)
Link to comment
Share on other sites

This prevents which buffer's overflow? Not PHP's buffers, that's for sure... like I said, PHP first reads the content and places it into its own structure, which it manages. If an attacker inputs a content that damages those buffers, it will do so before you're able to truncate the input yourself.

Link to comment
Share on other sites

Why should I know about file upload security issues if I don't want to do any file uploads? Why should I know about path injection if none of my file references depend on user input?
Yes, if your program doesn't accept any form of user input it should be quite safe. :) But his does, and any form of input can be used maliciously, in ways that often can't be predicted ahead of time. Maybe a poorly-sanitized text input leads to a remote file inclusion, which permits a binary file being placed on the server, which opens the door to a known bug in some standard executable, which then allows the attacker to do privilege escalation, and so on. I think it's every programmer's interest to be acquainted with, or at least aware of the kinds of exploits that may be staged against their programs- even if they don't think a particular method applies to their code. Yes, keeping your executables up to date is important, but a lot of hosts just don't, won't, or can't do that. Many of the larger hosts won't do it for fear of breaking their custom setups or breaking their client's scripts. It's unfortunate, but true.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...