ShadowMage Posted May 5, 2016 Share Posted May 5, 2016 I'm using DOMDocument to load an xml document from a string generated in PHP. One particular node needs to have a trademark symbol in it's value and I'm having trouble getting that to load with DOMDocument. First a little setup/background: I have a value stored in a database field that needs to have the trademark. This value is both displayed on screen and written to the XML. Right now I'm using placeholder text ("{TM}") and using str_replace to replace the placeholder text with the HTML entity (™) to display it onscreen. This works fine. It doesn't work when writing to XML because the ™ is not a recognized entity apparently. So, when writing to the XML I tried: $systemDesc = str_replace('{TM}', '', $arrGlazeType[$this->GlazeType]['Description']); Which sort of works. I get an extra goofy 'A' symbol, though: Guardian 275™ If I just replace the placeholder with an empty string, it displays just "Guardian 275" as I'd expect. So I tried: $systemDesc = str_replace('{TM}', '™', $arrGlazeType[$this->GlazeType]['Description']); which gives me: Guardian 275â„¢ and $systemDesc = str_replace('{TM}', chr(153), $arrGlazeType[$this->GlazeType]['Description']); which produces an error: Warning: DOMDocument::loadXML() [function.DOMDocument-loadXML]: Input is not proper UTF-8, indicate encoding ! Bytes: 0x99 0x3C 0x2F 0x73 in Entity, line: 19 Line 19 is where the trademark symbol should appear. You might ask "well, why are you using the placeholder anyway?" Good question. When I put the trademark symbol directly into the database, it showed up on screen as a black diamond with a question mark. The htmlentities() function didn't help. And in the XML, I get the same error as using the chr() function. I didn't think this would be that hard... Link to comment Share on other sites More sharing options...
justsomeguy Posted May 5, 2016 Share Posted May 5, 2016 It sounds like you have an issue with UTF, you might need to use utf8_encode or make sure your files are saved using UTF. You might also be able to just escape the ampersand in the XML, e.g. ™. Link to comment Share on other sites More sharing options...
musicman Posted May 6, 2016 Share Posted May 6, 2016 Hi, I had same issue. My solution was to use entities with the XML DTD like: <!ENTITY trademark "™"> so we can use it in XML document like: <brand>Coca Cola &trademark;</brand> it's like create your own entity syntax. Link to comment Share on other sites More sharing options...
ShadowMage Posted May 6, 2016 Author Share Posted May 6, 2016 (edited) It sounds like you have an issue with UTF, you might need to use utf8_encode or make sure your files are saved using UTF. You might also be able to just escape the ampersand in the XML, e.g. ™. I have the xml DTD included in the string I'm trying to load: <?xml version='1.0' encoding='UTF-8'?> I also tried specifying the encoding when I created the DOMDocument object: $doc = new DOMDocument('1.0', 'UTF-8'); which didn't help. I'm not working with any files yet. It just generates the string, then attempts to load it into DOMDocument. Or are you referring to the PHP script itself being saved as UTF-8? As for utf8_encode, where do I use that? On the string I'm trying to load into DOMDocument? Like this: $doc->loadXML(utf8_encode($xml)); Hi, I had same issue. My solution was to use entities with the XML DTD like: <!ENTITY trademark "™"> so we can use it in XML document like: <brand>Coca Cola &trademark;</brand> it's like create your own entity syntax. Thanks, I didn't know that was possible. If I can't get this working any other way, I'll try creating my own entity. Edited May 6, 2016 by ShadowMage Link to comment Share on other sites More sharing options...
ShadowMage Posted May 6, 2016 Author Share Posted May 6, 2016 ok, so I tried escaping the ampersand (™) which doesn't give me an error, but it doesn't print the trademark symbol either. Just prints "™" Also tried using utf8_encode on the xml string before loading it into DOMDocument. I tried it with the chr() function and it gave me the same result as using did before (ie, the goofy 'A' symbol). Using still gives me the same result. Tried it with the actual TM character and it gave me a lot more goofy symbols than before. Didn't get any errors using utf8_encode, though! Link to comment Share on other sites More sharing options...
Ingolme Posted May 6, 2016 Share Posted May 6, 2016 Your XML file and PHP file need to be saved as UTF-8 from your text editor. Just telling the browser that the encoding is UTF-8 is not enough. The encoding attribute just is an indication of what encoding the file is supposed to have, if it's not saved in that encoding things will not be displayed properly. Link to comment Share on other sites More sharing options...
ShadowMage Posted May 9, 2016 Author Share Posted May 9, 2016 As mentioned before, I haven't actually created an XML file yet. I'm just loading a string previously generated in the PHP into DOMDocument. I did try re-saving all my PHP files as UTF-8, but it didn't seem to make any difference... I also noticed that when I actually save the XML file to the filesystem, it doesn't put anything in the place where the trademark is supposed to be. No trademark, no other goofy symbols. When I output it to the browser, I get the behavior described above. Link to comment Share on other sites More sharing options...
dsonesuk Posted May 9, 2016 Share Posted May 9, 2016 Are you sure is correct encoding to use, every where I look its UTF8 ™ for decimal or ™ for hex encoding, with from what I can gather it is the alt code (alt 153) like you see in windows character map. Link to comment Share on other sites More sharing options...
ShadowMage Posted May 10, 2016 Author Share Posted May 10, 2016 It seems to recognize as a valid entity at least. I know that it works in HTML. I did try using the other two you mentioned. ™ doesn't seem to work in either the browser or when I save it as a file. Shows a bunch of odd symbols in the browser and puts the entity code itself into the file where the symbol should be. Maybe that's how it's supposed to work, not sure. However, while ™ doesn't work in the browser, it does put the actual symbol (not the entity code) in the right place when saved to a file. I think I'll run with it this way for now and see if I encounter any problems. Link to comment Share on other sites More sharing options...
dsonesuk Posted May 10, 2016 Share Posted May 10, 2016 Thats strange? cuz placed all side by side and alternatives dec and hex show with no problem in browser. Link to comment Share on other sites More sharing options...
ShadowMage Posted May 10, 2016 Author Share Posted May 10, 2016 This is the code I'm using: $doc = new DOMDocument('1.0', 'UTF-8'); $doc->loadXML($xml); // $tmp = $doc->saveXML(); $doc->save($FileLocation); The saveXML() function returns the xml code as a string. The save() function saves it to a file. When I echo $tmp, I get all the goofy characters described above. However, I just tested printing out the xml string prior to loading it into DOMDocument and it prints out just fine... Must be something with DOMDocument messing up the character encoding. Link to comment Share on other sites More sharing options...
dsonesuk Posted May 10, 2016 Share Posted May 10, 2016 when you echo is it as php webpage with characterset utf, html tags etc or just php code page? Link to comment Share on other sites More sharing options...
ShadowMage Posted May 10, 2016 Author Share Posted May 10, 2016 I just echoed the xml string. Nothing else. No html tags of any kind. This script is intended to be accessed through AJAX so there is no other output (aside from any error messages if/when they occur). Link to comment Share on other sites More sharing options...
dsonesuk Posted May 10, 2016 Share Posted May 10, 2016 But it would eventually appear within html page with utf characterset yes! Thats is when it would show as it should, and its less likely you will have problems that you have been experiencing, me thinks. Link to comment Share on other sites More sharing options...
Ingolme Posted May 10, 2016 Share Posted May 10, 2016 I would check to see whether $xml is properly encoded or not. If it is not, then I would check the source where $xml came from. If it's from a database then you need to make sure you set the proper encoding for the database connection. Link to comment Share on other sites More sharing options...
justsomeguy Posted May 10, 2016 Share Posted May 10, 2016 If everything is properly encoded with UTF-8 then you can even write the actual trademark character itself instead of an entity for it. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now