son Posted July 29, 2009 Share Posted July 29, 2009 I have a script that shows categories along with corresponding sub-categories as: $q = 'SELECT category_id, parent_id, category FROM categories ORDER BY parent_id'; $r = mysqli_query($dbc, $q); if (mysqli_num_rows($r) > 0) {function make_list ($parent){global $categories;echo "\n<ul>\n";foreach ($parent as $category_id => $existing_categories) { echo "<li>$existing_categories: <span class=\"small\"<a href=\"categories.php?mode=del&id=$category_id\">Delete</a> | <a href=\"categories_update.php?id=$category_id\">Update</a></span>"; if(isset($categories[$category_id])) { make_list($categories[$category_id]); } echo "</li>\n"; }echo "</ul>\n";} $categories = array(); echo "<h2 class=\"largeMargin\">Update and delete categories</h2>"; while (list($category_id, $parent_id, $category) = mysqli_fetch_array($r, MYSQLI_NUM)) { $categories[$parent_id][$category_id] = $category; } // echo "<pre>" . print_r ($categories,1) . "</pre>"; make_list($categories[0]); } } Now, I find that I have to use htmlentities() as quite few times caracters have been entered (like &) that need converting into web-safe format. With my function I am not sure where I would use htmlentities(). Has anyone done this sort of thing?Son Link to comment Share on other sites More sharing options...
Ingolme Posted July 29, 2009 Share Posted July 29, 2009 I usually use the htmlentities() or htmlspecialchars() function before putting data in the database, rather than after extracting it. Link to comment Share on other sites More sharing options...
justsomeguy Posted July 29, 2009 Share Posted July 29, 2009 I actually typically save the data as-is and transform it for display, but I guess that's just preference. I am not sure where I would use htmlentitiesWhen you display a variable that needs to be escaped, use it then. Link to comment Share on other sites More sharing options...
son Posted July 29, 2009 Author Share Posted July 29, 2009 I actually typically save the data as-is and transform it for display, but I guess that's just preference.When you display a variable that needs to be escaped, use it then.I have the habit to change problematic characters as the weird ' from MS Word etc when inserting data in database and then do all the safe displaying stuff when getting it from db... Is this not a good idea?Son Link to comment Share on other sites More sharing options...
justsomeguy Posted July 29, 2009 Share Posted July 29, 2009 It doesn't really matter. Link to comment Share on other sites More sharing options...
son Posted July 29, 2009 Author Share Posted July 29, 2009 It doesn't really matter.One more question: I just found that when I use htmlentities to display the current fields from database in form and there is a £ for example after updating, the £ creates a lot of weird caracters. Taking the htmlentities function out and just display the data in form works fine. Why is that? Although the £ displays ok in form it is not correc to have £ in html code...Son Link to comment Share on other sites More sharing options...
justsomeguy Posted July 29, 2009 Share Posted July 29, 2009 What does it translate the character into? Link to comment Share on other sites More sharing options...
Ingolme Posted July 30, 2009 Share Posted July 30, 2009 It probably is converted to UTF-8. Some characters take 2, 3 or 4 bytes. Make sure that the page encoding is set to UTF-8. Either with a PHP header, or a <meta> tag. Link to comment Share on other sites More sharing options...
son Posted July 30, 2009 Author Share Posted July 30, 2009 It probably is converted to UTF-8. Some characters take 2, 3 or 4 bytes. Make sure that the page encoding is set to UTF-8. Either with a PHP header, or a <meta> tag.Have on page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />(since beginning)The characters are:���Database columns are set to:utf8_general_ciCannot get my head around this. Had some issues with characters in the past and still not able to resolve...Son Link to comment Share on other sites More sharing options...
boen_robot Posted July 30, 2009 Share Posted July 30, 2009 Have on page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />(since beginning)The characters are:���Database columns are set to:utf8_general_ciCannot get my head around this. Had some issues with characters in the past and still not able to resolve...SonThree things:1. Try to add the header with PHP as:header('Content-Type: text/html;charset=utf-8'); 2. Ensure that your PHP file is UTF-8 encoded. Open it with Notepad, click "Save As", and make sure the "encoding" menu says "UTF-8", not "ANSI".3. Make sure your SQL connections uses UTF-8 prior to executing queries. To do that with the MySQLi extension, use mysqli_set_charset(), like: mysqli_set_charset($dbc, 'utf8'); BTW, why don't you use MySQLi OOP style? Is it uncomfortable or something? Personally, I love it, but that's perhaps just a preference. Link to comment Share on other sites More sharing options...
son Posted July 30, 2009 Author Share Posted July 30, 2009 Three things:1. Try to add the header with PHP as:header('Content-Type: text/html;charset=utf-8'); 2. Ensure that your PHP file is UTF-8 encoded. Open it with Notepad, click "Save As", and make sure the "encoding" menu says "UTF-8", not "ANSI".3. Make sure your SQL connections uses UTF-8 prior to executing queries. To do that with the MySQLi extension, use mysqli_set_charset(), like: mysqli_set_charset($dbc, 'utf8'); BTW, why don't you use MySQLi OOP style? Is it uncomfortable or something? Personally, I love it, but that's perhaps just a preference. Although Dreamweaver saves as default UTF-8 Notepad showed 'Ansi' in save as dialog, so I saved as UTF-8 and overrid any previous setting. Then I did all other changes... Same outcome, weird characters. Only thing that works in to take htmlentities out for table data to be displayed and end up with '£' in code instead of '£'...Do not get it...Son Link to comment Share on other sites More sharing options...
boen_robot Posted July 30, 2009 Share Posted July 30, 2009 Although Dreamweaver saves as default UTF-8 Notepad showed 'Ansi' in save as dialog, so I saved as UTF-8 and overrid any previous setting. Then I did all other changes... Same outcome, weird characters. Only thing that works in to take htmlentities out for table data to be displayed and end up with '£' in code instead of '£'...Do not get it...SonOh... sorry, I didn't read THAT part. Yeah, htmlentities() converts all characters that don't fit into the encoding it expects. You should use htmlspecialchars() instead and/or specify the third argument as UTF-8, like:htmlentities($str, ENT_QUOTES, 'UTF-8'); I don't really understand much about encodings either, but one thing I've learned to accept as a rule of thumb - if there is an encoding option at a place, always specify "UTF-8" explicitly for it, and you'll be fine, regardless of what kind of symbols you have. The sacrifice you're making is that non-latin data can take two, three of four bytes per character instead of one, but compared to the healthcare costs you'll be saving, that's a well worth sacrifice in my opinion. Link to comment Share on other sites More sharing options...
son Posted July 31, 2009 Author Share Posted July 31, 2009 Oh... sorry, I didn't read THAT part. Yeah, htmlentities() converts all characters that don't fit into the encoding it expects. You should use htmlspecialchars() instead and/or specify the third argument as UTF-8, like:htmlentities($str, ENT_QUOTES, 'UTF-8'); I don't really understand much about encodings either, but one thing I've learned to accept as a rule of thumb - if there is an encoding option at a place, always specify "UTF-8" explicitly for it, and you'll be fine, regardless of what kind of symbols you have. The sacrifice you're making is that non-latin data can take two, three of four bytes per character instead of one, but compared to the healthcare costs you'll be saving, that's a well worth sacrifice in my opinion. That is really great help. Almost solved all my problems. Using htmlspecialchars() does not convert the pound sign, which I need. Therefore I used htmlentities($str, ENT_QUOTES, 'UTF-8') which does what I want to 80%, only that now when something is entered as '<strong>text</strong> into db then the '<' and '>' also get converted into web-safe characters (what is not what I want in this case). What would you recommend? The most important ones are currency symbols, but there is also html tags etc...Son Link to comment Share on other sites More sharing options...
boen_robot Posted July 31, 2009 Share Posted July 31, 2009 Wait just a second... do you want HTML text or HTML code?If you want HTML text, using htmlentities() is the solution. It escapes all characters that need to be escaped if the content is to be used as text to be displayed in an HTML document.If you want HTML code, you can use it as is. You don't need to worry for the pound sign - if everything is UTF-8, it's safe not to escape it. In fact, if you do escape it, that's five (or seven in the case of "£") more bytes for one character that could be presented by two or three bytes. If you want to make sure you have a well formed piece of (XHTML!!!) code, you can use DOM to load it, and output it back. If there are errors, you'll know. If there aren't, you'll get some "namespace cleanup" if it's needed. Like so: $dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = $dom->saveXML();}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.} If you want to get rid of all eleents, but leave the text nodes in an escaped form, you can do a combination of the two: $dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = htmlentities($dom->textContent, ENT_QUOTE, 'UTF-8');}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.} Either way, don't worry about the pound sign. It will display fine, as long as everything is UTF-8. And if it's not escaped, it will take less space. Link to comment Share on other sites More sharing options...
son Posted July 31, 2009 Author Share Posted July 31, 2009 Wait just a second... do you want HTML text or HTML code?If you want HTML text, using htmlentities() is the solution. It escapes all characters that need to be escaped if the content is to be used as text to be displayed in an HTML document.If you want HTML code, you can use it as is. You don't need to worry for the pound sign - if everything is UTF-8, it's safe not to escape it. In fact, if you do escape it, that's five (or seven in the case of "£") more bytes for one character that could be presented by two or three bytes. If you want to make sure you have a well formed piece of (XHTML!!!) code, you can use DOM to load it, and output it back. If there are errors, you'll know. If there aren't, you'll get some "namespace cleanup" if it's needed. Like so:$dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = $dom->saveXML();}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.} If you want to get rid of all eleents, but leave the text nodes in an escaped form, you can do a combination of the two: $dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = htmlentities($dom->textContent, ENT_QUOTE, 'UTF-8');}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.} Either way, don't worry about the pound sign. It will display fine, as long as everything is UTF-8. And if it's not escaped, it will take less space. So, maybe my worries were then for nothing. Currently is was only the £ sign that I was worried about. Without using the functions it displays fine, I only thought that is not right to have '£' instead of '£' in the code...Many thanks,Son Link to comment Share on other sites More sharing options...
boen_robot Posted July 31, 2009 Share Posted July 31, 2009 I only thought that is not right to have '£' instead of '£' in the code...In most encodings - yes. In UTF-8, pretty much all symbols are safe... that's what UTF-8 was created for after all - to let text contain all known to man symbols without a need to switch encodings or escape characters.(just something to keep in mind...) Link to comment Share on other sites More sharing options...
son Posted August 1, 2009 Author Share Posted August 1, 2009 In most encodings - yes. In UTF-8, pretty much all symbols are safe... that's what UTF-8 was created for after all - to let text contain all known to man symbols without a need to switch encodings or escape characters.(just something to keep in mind...)Many thanks for your valuable input, always learn something new...:-)Son Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.