Jump to content

problem with disable-output-escaping


chandrapaulm

Recommended Posts

Hi All,I am using the attribute disable-output-escaping="no", and the escaping is done only if the corresponding value has a friendly code.

Less-than sign "<" has	Friedly-code: <	Numerical code:  60;	Hex Code:  x3C;

If I give any Numerical or Hex Code which does't have the friendly code then the escaping is not done.

Single quote '	Friedly-code: not available	Numerical code:  39;	Hex Code:  x27;

Sample XSL:

<?xml version="1.0" encoding="iso-8859-1"?><stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">  <output method="html" indent="yes" encoding="iso-8859-1"/>	<xsl:value-of select="This is single quot   39;  in the middle of sentence." disable-output-escaping="no"/></stylesheet>

Output of XSL:This is single quot ' in the middle of sentence.Can anyone please provide me with a solution so that these special charaters can be escaped.?Thanks in AdvanceChandraNote:I am referring http://webdesign.about.com/od/localization...codes-ascii.htm for ascii codes.

Link to comment
Share on other sites

Did you tried disable-output-escaping="yes"? disable-output-escaping="no" is the default value, so your results are more than predictable.

Link to comment
Share on other sites

Hi,when the disable-output-escaping="no" - the xslt should return as below.This is single quot 39; in the middle of sentence.when the disable-output-escaping="yes" - the xslt should return as below.This is single quot ' in the middle of sentence.I use the output of the XSL on the HTML, some places as a title, or etc.the html becomes something like<input type=submit title='This is single quot ' in the middle of sentence.'>while displaying on the screen, only the part of it is displayedThis is single quotPlease dont suggest to change the html code to title="...."This will fail when I have a double quote in the string.This can be solved if the XSL return the string as it is without converting the escape characters.Please help me in finding the solution.Thanks in advanceChandraNote: I have added space in between 39;, since the browser is converting this value.. :)

Link to comment
Share on other sites

Your sample XML is completely wrong actually. I think you meant more like:

<?xml version="1.0" encoding="iso-8859-1"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">  <xsl:output method="html" indent="yes" encoding="iso-8859-1"/>  <xsl:template match="/">	<input title="This is single quote  '  in the middle of sentence, and this is a double quote " in the middle of another sentence." />  </xsl:template></xsl:stylesheet>

Try and see how this one goes. And try switching the quotes if you have to. And don't care that much for entity completeness. With disable-output-escaping="no" (i.e. the default), XSLT escapes all entities from the output, which would otherwise cause the XML not to be well formed. That's all you need. The entities that could be escaped will be escaped, making the output HTML file smaller (I mean, wouldn't you like for your output file to be smaller?).

Link to comment
Share on other sites

Seems that I have confused you a lot, Let me keep my requirement simple.I want the Friendly/Numerical/Hex code not the actual character in the output.I want the output of the XSL as below with escape characters not being converted, same as entered in the XML file.This is a Single quote 39; in the middle of the sentence.Thanks,Chandra.

Link to comment
Share on other sites

This is a Single quote 39; in the middle of the sentence.
Why? What's wrong with having the exact character there? If it's not going to make the XML non-well-formed, what's the problem?If you do it like:
<input title='This is single quote   39;  in the middle of sentence, and this is a double quote " in the middle of another sentence.' />

It should work the way you want. Well, some serializers may instead output it the same way as

<input title="This is single quote  '  in the middle of sentence, and this is a double quote " in the middle of another sentence." />

But again, the output will not be malformed.

Link to comment
Share on other sites

Hi boen_robot,Thankyou for the solution.Its working fine the way you suggested.But still I am confused why the parser is not converting the special characters which don't have a friendly code.RgdsChandra
Well, it's a sort of an always-on optimization. When you write a code like this, it takes less space, and this optimization is always performed.The exact reasons for this are hard to really figure out, but bottom line is that this is a good thing, so you shouldn't try to stop it anyway.If you're really interested in the why...When XML files are parsed, the XML parser translates the XML files into "tokens" it can understand, much like with most programming languages. If you just look at the DOM specification, you'll know what all the tokens are. When the XML parser encounters an entity, it treats it as if you indented to write the character(s) behind it. So what eventually gets into memory is the original character. XSLT is XML, so the same rules apply to it. So, when in XSLT you write
 39;

The XSLT processor, after first having parsed the XML with an XML parser, gets

'

At this point, you might think that if you write

& amp;#39;

you should be getting

'

but unless your output method is "text", you'll instead see

& amp;#39;

This happens because of the other part involved in XML processing - serialization. An XML serializer translates all tokens back into their text representations. Within text nodes, special characters in XML (&, < and >) are always escaped in order to prevent malformedness while still expressing the same text. Quotes and apostrophes are intelligently escaped only in attribute text nodes, and only when not escaping the character would otherwise cause malformedness. All characters that are invalid in the encoding specified are also escaped into entities.I'm saying this is an "optimization" because the output text representation takes less space this way. Think about it... with

 39;

you have &, #, 3, 9, ;... 5 characters. 5 characters in order to represent a single character. Those are 5 bytes you're taking away for a single damn apostrophe. And if you have directly

'

you have '... 1 character. 1 character, taking 1 byte, in order to represent 1 character.The same calculation would also apply for multy byte characters and still make for a saving. With

& #1042;

you have &, #, 1, 0, 4, 2, ;.... 7 characters, or seven bytes, to represent 1 character. If you write directly

В

You need only 1 characer, or 2 bytes, to represent this 1 character.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...