Jump to content

Htmlentities() On Make_list Function


son

Recommended Posts

I have a script that shows categories along with corresponding sub-categories as:

$q = 'SELECT category_id, parent_id, category FROM categories ORDER BY parent_id';	$r = mysqli_query($dbc, $q);	if (mysqli_num_rows($r) > 0)		{function make_list ($parent){global $categories;echo "\n<ul>\n";foreach ($parent as $category_id => $existing_categories)		{	echo "<li>$existing_categories: <span class=\"small\"<a href=\"categories.php?mode=del&id=$category_id\">Delete</a> | <a href=\"categories_update.php?id=$category_id\">Update</a></span>";		if(isset($categories[$category_id]))		{		make_list($categories[$category_id]);		}	echo "</li>\n";	}echo "</ul>\n";}	$categories = array();	echo "<h2 class=\"largeMargin\">Update and delete categories</h2>";		while (list($category_id, $parent_id, $category) = mysqli_fetch_array($r, MYSQLI_NUM)) 		{		$categories[$parent_id][$category_id] = $category;		}		// echo "<pre>" . print_r ($categories,1) . "</pre>";	make_list($categories[0]);		}	}

Now, I find that I have to use htmlentities() as quite few times caracters have been entered (like &) that need converting into web-safe format. With my function I am not sure where I would use htmlentities(). Has anyone done this sort of thing?Son

Link to comment
Share on other sites

I usually use the htmlentities() or htmlspecialchars() function before putting data in the database, rather than after extracting it.

Link to comment
Share on other sites

I actually typically save the data as-is and transform it for display, but I guess that's just preference.

I am not sure where I would use htmlentities
When you display a variable that needs to be escaped, use it then.
Link to comment
Share on other sites

I actually typically save the data as-is and transform it for display, but I guess that's just preference.When you display a variable that needs to be escaped, use it then.
I have the habit to change problematic characters as the weird ' from MS Word etc when inserting data in database and then do all the safe displaying stuff when getting it from db... Is this not a good idea?Son
Link to comment
Share on other sites

It doesn't really matter.
One more question: I just found that when I use htmlentities to display the current fields from database in form and there is a £ for example after updating, the £ creates a lot of weird caracters. Taking the htmlentities function out and just display the data in form works fine. Why is that? Although the £ displays ok in form it is not correc to have £ in html code...Son
Link to comment
Share on other sites

It probably is converted to UTF-8. Some characters take 2, 3 or 4 bytes. Make sure that the page encoding is set to UTF-8. Either with a PHP header, or a <meta> tag.

Link to comment
Share on other sites

It probably is converted to UTF-8. Some characters take 2, 3 or 4 bytes. Make sure that the page encoding is set to UTF-8. Either with a PHP header, or a <meta> tag.
Have on page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />(since beginning)The characters are:���Database columns are set to:utf8_general_ciCannot get my head around this. Had some issues with characters in the past and still not able to resolve...Son
Link to comment
Share on other sites

Have on page <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />(since beginning)The characters are:���Database columns are set to:utf8_general_ciCannot get my head around this. Had some issues with characters in the past and still not able to resolve...Son
Three things:1. Try to add the header with PHP as:
header('Content-Type: text/html;charset=utf-8');

2. Ensure that your PHP file is UTF-8 encoded. Open it with Notepad, click "Save As", and make sure the "encoding" menu says "UTF-8", not "ANSI".3. Make sure your SQL connections uses UTF-8 prior to executing queries. To do that with the MySQLi extension, use mysqli_set_charset(), like:

mysqli_set_charset($dbc, 'utf8');

BTW, why don't you use MySQLi OOP style? Is it uncomfortable or something? Personally, I love it, but that's perhaps just a preference.

Link to comment
Share on other sites

Three things:1. Try to add the header with PHP as:
header('Content-Type: text/html;charset=utf-8');

2. Ensure that your PHP file is UTF-8 encoded. Open it with Notepad, click "Save As", and make sure the "encoding" menu says "UTF-8", not "ANSI".3. Make sure your SQL connections uses UTF-8 prior to executing queries. To do that with the MySQLi extension, use mysqli_set_charset(), like:

mysqli_set_charset($dbc, 'utf8');

BTW, why don't you use MySQLi OOP style? Is it uncomfortable or something? Personally, I love it, but that's perhaps just a preference.

Although Dreamweaver saves as default UTF-8 Notepad showed 'Ansi' in save as dialog, so I saved as UTF-8 and overrid any previous setting. Then I did all other changes... Same outcome, weird characters. Only thing that works in to take htmlentities out for table data to be displayed and end up with '£' in code instead of '£'...Do not get it...Son
Link to comment
Share on other sites

Although Dreamweaver saves as default UTF-8 Notepad showed 'Ansi' in save as dialog, so I saved as UTF-8 and overrid any previous setting. Then I did all other changes... Same outcome, weird characters. Only thing that works in to take htmlentities out for table data to be displayed and end up with '£' in code instead of '£'...Do not get it...Son
Oh... sorry, I didn't read THAT part. Yeah, htmlentities() converts all characters that don't fit into the encoding it expects. You should use htmlspecialchars() instead and/or specify the third argument as UTF-8, like:
htmlentities($str, ENT_QUOTES, 'UTF-8');

I don't really understand much about encodings either, but one thing I've learned to accept as a rule of thumb - if there is an encoding option at a place, always specify "UTF-8" explicitly for it, and you'll be fine, regardless of what kind of symbols you have. The sacrifice you're making is that non-latin data can take two, three of four bytes per character instead of one, but compared to the healthcare costs you'll be saving, that's a well worth sacrifice in my opinion.

Link to comment
Share on other sites

Oh... sorry, I didn't read THAT part. Yeah, htmlentities() converts all characters that don't fit into the encoding it expects. You should use htmlspecialchars() instead and/or specify the third argument as UTF-8, like:
htmlentities($str, ENT_QUOTES, 'UTF-8');

I don't really understand much about encodings either, but one thing I've learned to accept as a rule of thumb - if there is an encoding option at a place, always specify "UTF-8" explicitly for it, and you'll be fine, regardless of what kind of symbols you have. The sacrifice you're making is that non-latin data can take two, three of four bytes per character instead of one, but compared to the healthcare costs you'll be saving, that's a well worth sacrifice in my opinion.

That is really great help. Almost solved all my problems. Using htmlspecialchars() does not convert the pound sign, which I need. Therefore I used htmlentities($str, ENT_QUOTES, 'UTF-8') which does what I want to 80%, only that now when something is entered as '<strong>text</strong> into db then the '<' and '>' also get converted into web-safe characters (what is not what I want in this case). What would you recommend? The most important ones are currency symbols, but there is also html tags etc...Son
Link to comment
Share on other sites

Wait just a second... do you want HTML text or HTML code?If you want HTML text, using htmlentities() is the solution. It escapes all characters that need to be escaped if the content is to be used as text to be displayed in an HTML document.If you want HTML code, you can use it as is. You don't need to worry for the pound sign - if everything is UTF-8, it's safe not to escape it. In fact, if you do escape it, that's five (or seven in the case of "£") more bytes for one character that could be presented by two or three bytes. If you want to make sure you have a well formed piece of (XHTML!!!) code, you can use DOM to load it, and output it back. If there are errors, you'll know. If there aren't, you'll get some "namespace cleanup" if it's needed. Like so:

$dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = $dom->saveXML();}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.}

If you want to get rid of all eleents, but leave the text nodes in an escaped form, you can do a combination of the two:

$dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = htmlentities($dom->textContent, ENT_QUOTE, 'UTF-8');}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.}

Either way, don't worry about the pound sign. It will display fine, as long as everything is UTF-8. And if it's not escaped, it will take less space.

Link to comment
Share on other sites

Wait just a second... do you want HTML text or HTML code?If you want HTML text, using htmlentities() is the solution. It escapes all characters that need to be escaped if the content is to be used as text to be displayed in an HTML document.If you want HTML code, you can use it as is. You don't need to worry for the pound sign - if everything is UTF-8, it's safe not to escape it. In fact, if you do escape it, that's five (or seven in the case of "£") more bytes for one character that could be presented by two or three bytes. If you want to make sure you have a well formed piece of (XHTML!!!) code, you can use DOM to load it, and output it back. If there are errors, you'll know. If there aren't, you'll get some "namespace cleanup" if it's needed. Like so:
$dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = $dom->saveXML();}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.}

If you want to get rid of all eleents, but leave the text nodes in an escaped form, you can do a combination of the two:

$dom = new DOMDocument('1.0', 'UTF-8');if (@$dom->loadXML($str)) {$str = htmlentities($dom->textContent, ENT_QUOTE, 'UTF-8');}else {//Errors... handle them here. You can get the messages with the libxml functions if you want.}

Either way, don't worry about the pound sign. It will display fine, as long as everything is UTF-8. And if it's not escaped, it will take less space.

So, maybe my worries were then for nothing. Currently is was only the £ sign that I was worried about. Without using the functions it displays fine, I only thought that is not right to have '£' instead of '£' in the code...Many thanks,Son
Link to comment
Share on other sites

I only thought that is not right to have '£' instead of '£' in the code...
In most encodings - yes. In UTF-8, pretty much all symbols are safe... that's what UTF-8 was created for after all - to let text contain all known to man symbols without a need to switch encodings or escape characters.(just something to keep in mind...)
Link to comment
Share on other sites

In most encodings - yes. In UTF-8, pretty much all symbols are safe... that's what UTF-8 was created for after all - to let text contain all known to man symbols without a need to switch encodings or escape characters.(just something to keep in mind...)
Many thanks for your valuable input, always learn something new...:-)Son
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...