Jump to content

Trademark symbol in XML using DOMDocument


ShadowMage

Recommended Posts

I'm using DOMDocument to load an xml document from a string generated in PHP. One particular node needs to have a trademark symbol in it's value and I'm having trouble getting that to load with DOMDocument.

 

First a little setup/background:

I have a value stored in a database field that needs to have the trademark. This value is both displayed on screen and written to the XML. Right now I'm using placeholder text ("{TM}") and using str_replace to replace the placeholder text with the HTML entity (™) to display it onscreen. This works fine. It doesn't work when writing to XML because the ™ is not a recognized entity apparently. So, when writing to the XML I tried:

 

$systemDesc = str_replace('{TM}', '', $arrGlazeType[$this->GlazeType]['Description']);

Which sort of works. I get an extra goofy 'A' symbol, though:

 

Guardian 275™

If I just replace the placeholder with an empty string, it displays just "Guardian 275" as I'd expect. So I tried:

 

$systemDesc = str_replace('{TM}', '™', $arrGlazeType[$this->GlazeType]['Description']);

which gives me:

 

Guardian 275â„¢

and

 

$systemDesc = str_replace('{TM}', chr(153), $arrGlazeType[$this->GlazeType]['Description']);

which produces an error:

 

Warning: DOMDocument::loadXML() [function.DOMDocument-loadXML]: Input is not proper UTF-8, indicate encoding ! Bytes: 0x99 0x3C 0x2F 0x73 in Entity, line: 19

Line 19 is where the trademark symbol should appear.

 

You might ask "well, why are you using the placeholder anyway?" Good question. When I put the trademark symbol directly into the database, it showed up on screen as a black diamond with a question mark. The htmlentities() function didn't help. And in the XML, I get the same error as using the chr() function.

 

I didn't think this would be that hard... :fool:

Link to comment
Share on other sites

It sounds like you have an issue with UTF, you might need to use utf8_encode or make sure your files are saved using UTF. You might also be able to just escape the ampersand in the XML, e.g. ™.

Link to comment
Share on other sites

Hi,

 

I had same issue. My solution was to use entities with the XML DTD like:

<!ENTITY trademark "™">

so we can use it in XML document like:

<brand>Coca Cola &trademark;</brand>

it's like create your own entity syntax.

Link to comment
Share on other sites

It sounds like you have an issue with UTF, you might need to use utf8_encode or make sure your files are saved using UTF. You might also be able to just escape the ampersand in the XML, e.g. &trade;.

I have the xml DTD included in the string I'm trying to load:

<?xml version='1.0' encoding='UTF-8'?>

I also tried specifying the encoding when I created the DOMDocument object:

 

$doc = new DOMDocument('1.0', 'UTF-8');

which didn't help.

 

I'm not working with any files yet. It just generates the string, then attempts to load it into DOMDocument. Or are you referring to the PHP script itself being saved as UTF-8?

As for utf8_encode, where do I use that? On the string I'm trying to load into DOMDocument? Like this:

$doc->loadXML(utf8_encode($xml));

Hi,

 

I had same issue. My solution was to use entities with the XML DTD like:

<!ENTITY trademark "™">

so we can use it in XML document like:

<brand>Coca Cola &trademark;</brand>

it's like create your own entity syntax.

Thanks, I didn't know that was possible. If I can't get this working any other way, I'll try creating my own entity.

Edited by ShadowMage
Link to comment
Share on other sites

ok, so I tried escaping the ampersand (&trade;) which doesn't give me an error, but it doesn't print the trademark symbol either. Just prints "™"

Also tried using utf8_encode on the xml string before loading it into DOMDocument. I tried it with the chr() function and it gave me the same result as using did before (ie, the goofy 'A' symbol). Using still gives me the same result. Tried it with the actual TM character and it gave me a lot more goofy symbols than before. Didn't get any errors using utf8_encode, though! :P

Link to comment
Share on other sites

Your XML file and PHP file need to be saved as UTF-8 from your text editor. Just telling the browser that the encoding is UTF-8 is not enough. The encoding attribute just is an indication of what encoding the file is supposed to have, if it's not saved in that encoding things will not be displayed properly.

Link to comment
Share on other sites

As mentioned before, I haven't actually created an XML file yet. I'm just loading a string previously generated in the PHP into DOMDocument. I did try re-saving all my PHP files as UTF-8, but it didn't seem to make any difference... I also noticed that when I actually save the XML file to the filesystem, it doesn't put anything in the place where the trademark is supposed to be. No trademark, no other goofy symbols. When I output it to the browser, I get the behavior described above.

Link to comment
Share on other sites

It seems to recognize as a valid entity at least. I know that it works in HTML. I did try using the other two you mentioned. ™ doesn't seem to work in either the browser or when I save it as a file. Shows a bunch of odd symbols in the browser and puts the entity code itself into the file where the symbol should be. Maybe that's how it's supposed to work, not sure. However, while ™ doesn't work in the browser, it does put the actual symbol (not the entity code) in the right place when saved to a file. I think I'll run with it this way for now and see if I encounter any problems.

Link to comment
Share on other sites

This is the code I'm using:

 

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml);
// $tmp = $doc->saveXML();
$doc->save($FileLocation);

The saveXML() function returns the xml code as a string. The save() function saves it to a file. When I echo $tmp, I get all the goofy characters described above. However, I just tested printing out the xml string prior to loading it into DOMDocument and it prints out just fine... Must be something with DOMDocument messing up the character encoding.

Link to comment
Share on other sites

I just echoed the xml string. Nothing else. No html tags of any kind. This script is intended to be accessed through AJAX so there is no other output (aside from any error messages if/when they occur).

Link to comment
Share on other sites

I would check to see whether $xml is properly encoded or not. If it is not, then I would check the source where $xml came from.

 

If it's from a database then you need to make sure you set the proper encoding for the database connection.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...