The utf8_decode() Function

iwato · February 15, 2011

Background: Find below 8 lines of code that I have numerically annotated for the purpose of discussion. The output of each is provided where appropriate.The Problem: I am having difficulty understanding what is going on between lines 5 and 6. My understanding of the utf8_decode() function is that it converts UTF-8 encoded strings to ISO-8859-1 encoded strings. Although this would surely explain why the string "J'y était comme ça, mais je ne le suis plus." becomes "J'y �tait comme �a, mais je ne le suis plus.", I find it difficult to understand, why line 6 returns UTF-8. Can you explain what is going on?If You're Still Game: In line 7 does the mb_convert_encoding() function simply throw out characters that it does not understand?

(1) print_r(mb_detect_order()); //Array ( [0] => ASCII [1] => UTF-8 )(2) $string = 'J\'y était comme ça, mais je ne le suis plus.';(3) echo mb_detect_encoding($string); // UTF-8(4) echo $string;  // J'y était comme ça, mais je ne le suis plus.(5) echo $string = utf8_decode($string); // J'y �tait comme �a, mais je ne le suis plus. (6) echo mb_detect_encoding($string); // UTF-8(7) echo mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8'); // J'y tait comme a, mais je ne le suis plus.(8) echo mb_detect_encoding($string); // UTF-8

Roddy

boen_robot · February 15, 2011

There's three vital pieces of information I'm missing...1. What's the encoding of the file? UTF-8?2. With the obvious exception of "<?php" and "?>", is that the full contents of the file? It helps if it is.3. Are you sure all characters in your sample string are present in the ISO-8859-1 charset, or is the idea of this experimentation to find out what happens when they aren't?

iwato · February 15, 2011

There's three vital pieces of information I'm missing...

In answer to your three questions:

1. What's the encoding of the file? UTF-8?

The file is UTF-8 encoded.

2. With the obvious exception of "<?php" and "?>", is that the full contents of the file? It helps if it is.

No, the file contains HTML tags, CSS styles, and plain text.

3. Are you sure all characters in your sample string are present in the ISO-8859-1 charset, or is the idea of this experimentation to find out what happens when they aren't?

To the best of my understanding UTF-8 treats the characters é and ç as double-byte characters and ISO-8859-1 treats them as single-byte characters of the extended-ASCII character set. I am trying to understand the behavior of the utf8_decode() function.Roddy

justsomeguy · February 15, 2011

It may be useful to either print your strings and look at them in a hex editor, or convert to their hex values so you can see mathematically what is happening.

function strToHex($string){	$hex='';	for ($i=0; $i < strlen($string); $i++)	{		$hex .= dechex(ord($string[$i]));	}	return $hex;}

iwato · February 16, 2011

It may be useful to either print your strings and look at them in a hex editor, or convert to their hex values so you can see mathematically what is happening.

function strHex($string) {	$hex='';	for ($i=0; $i < strlen($string); $i++) {		$hex .= dechex(ord($string[$i]));	}	return $hex;}

It is a nifty function that works! Thank you.Although I have not found the answer to my question yet, I was able to discover why line 8 was exhibiting UTF-8 instead of ASCII encoding. I was detecting the code of the wrong string! A really silly error.Roddy

Sign In

The utf8_decode() Function

Recommended Posts

iwato

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

iwato

Link to comment

Share on other sites

justsomeguy

Link to comment

Share on other sites

iwato

Link to comment

Share on other sites

Archived

Browse

Activity