Jump to content

Cdata Section Parser Error


suryabuddula

Recommended Posts

Hi All,My application generate a XML file contains special characters (ex: pound sign ) with encoding:UTF-8, which when opened via IE giving an error -An invalid character was found in the text content. Error processing resource...<filename> Tag causing problem:----<InvoiceMessages><InvoiceMessage><![CDATA[Love the mobile web? You can now get unlimited web access for just £1 a day. Perfect for checking the news, having a peep on facebook, or sticking a bid on ebay.]]></InvoiceMessage></InvoiceMessages>----The header of the file generated contains says –encoding: UTF-8. Header of the file:<?xml version="1.0" encoding="UTF-8"?>Question: Could someone suggest me why the XML file is not being opened by Internet explorer? Is there any syntax error in the CDATA section of above "InvoiceMessages" tag? The content between the <![CDATA[ and ]]> should be ignored by Parser, why still getting the Error while opened via IE?in other words....The special character(ex: pound sign ) within the CDATA section, should not be parsed....why I am still getting the ERROR?

Link to comment
Share on other sites

Are you that the £ symbol is actually UTF8 encoded?When viewing the source it should look like two or three random characters rather than a symbol.

Link to comment
Share on other sites

Is the MIME type of the file is application/xml? Are you opening it via HTTP at all? Is the file UTF-8 encoded (i.e. from Notepad, when you click "Save As...", does it say "UTF-8" in the encoding dropdown)?

Link to comment
Share on other sites

Is the MIME type of the file is application/xml? Are you opening it via HTTP at all? Is the file UTF-8 encoded (i.e. from Notepad, when you click "Save As...", does it say "UTF-8" in the encoding dropdown)?
The File is not being opened via HTTP at all. when we save as in notepad, it doesn't say as UTF-8.defaults as ANSI.But my question is - As the XML tag was generated with CDATA section which the pound sign lies between "<![CDATA[" and ends with "]]>" where parser would ignore it.
Link to comment
Share on other sites

Well then, change it to UTF-8 in that dropdown, and save it with this new encoding. It should then be working fine.I think the tutorial on CDATA is a little misleading in that it oversimplifies the situation. The contents of CDATA sections is not parsed for XML nodes, but it does get parsed by the XML parser as a single text-like node (called a CDATA node, obviously). CDATA is basically a syntactic sugar for shortening code that may contain various special characters such as "&" and "<", so that:

<InvoiceMessage><![CDATA[This <<<<<< (that && other)]]></InvoiceMessage>

is equivalent of

<InvoiceMessage>This <<<<<< (that && other)</InvoiceMessage>

So, regardless of whether a certain content is in CDATA or not, it's subject to the document encoding. The pound sign is invalid in your ANSI encoding that the document is being parsed as. It is valid in UTF-8 though.

Link to comment
Share on other sites

Well then, change it to UTF-8 in that dropdown, and save it with this new encoding. It should then be working fine.I think the tutorial on CDATA is a little misleading in that it oversimplifies the situation. The contents of CDATA sections is not parsed for XML nodes, but it does get parsed by the XML parser as a single text-like node (called a CDATA node, obviously). CDATA is basically a syntactic sugar for shortening code that may contain various special characters such as "&" and "<", so that:
<InvoiceMessage><![CDATA[This <<<<<< (that && other)]]></InvoiceMessage>

is equivalent of

<InvoiceMessage>This <<<<<< (that && other)</InvoiceMessage>

So, regardless of whether a certain content is in CDATA or not, it's subject to the document encoding. The pound sign is invalid in your ANSI encoding that the document is being parsed as. It is valid in UTF-8 though.

Link to comment
Share on other sites

There is a difference between what encoding you want (the one in the XML prolog, UTF-8 in your case), and what encoding the file is actually in (The one displayed in Notepad; ANSI by default).The encoding of the file (ASNI or UTF-8) denotes how are characters going to be written to file. Or in other words, whether a single byte would be used to store a character (ANSI), or 1 to 3 bytes (UTF-8), or 1 to arbitrary number of bytes (Unicode). The encoding in the XML prolog tells the XML parser how to read the XML document. If the characters were not written properly, you can't expect to read them correctly. UTF-8 is purposefully made so that it is easy to get errors if you try to read an ANSI document as UTF-8.It's not the pound sign that's the problem per se. It's the very notion that you're trying to read an ANSI encoded document as UTF-8 document, when you should instead be reading it with the same ANSI encoding (e.g. ISO-8859-1) that you wrote it in OR (better yet and the thing I'm suggesting you do from now on) write the document as UTF-8 (from Notepad's "Save As..." dialog), and then read it as such.

Link to comment
Share on other sites

Hi, Thanks for reply for my qurry.The file was genereated as an UTF-8 encoding. but the content in the file looks to be ANSI (from notepad file save as -defaults to ANSI encoding). to make it UTF-8 encoding, you are asking me to change to UTF-8 encoding and then write to it. The XML file was generated by a billing system and these are very huge number, so cann't be changed it after it preparation.If I removed the special character, the rest of the file is opening via Internet explorer. So I would feel the file is prepared within UTF-8 standard except the representation of the specail character....Hence I am tressing to know the XML tag giving error was made of CDATA section where the parser should not parse the contents, but it is parsing/reading it and hence the special character representation is not as per UTF-8 encoding, throughing the error.. I guess...Could you please look once again and see the XML tag below having any sytax problems? Is the CDATA section nested below..hence it not effective ,so cause the error?Tag causing problem:----<InvoiceMessages><InvoiceMessage><![CDATA[Love the mobile web? You can now get unlimited web access for just £1 a day. Perfect for checking the news, having a peep on facebook, or sticking a bid on ebay.]]></InvoiceMessage></InvoiceMessages>----The header of the file generated contains says –encoding: UTF-8.

Link to comment
Share on other sites

Ask the developers of this billing app to write files as UTF-8 then.Alternatively, create a program that would automatically do that. In PHP for example, you could use fopen() to read the file, and use utf8_encode() to convert the text.Bottom line is you need the file to be encoded as UTF-8. The only other way is to represent all non-ASCII characters as entities, but since the generation of the XML is out of your control, doing that is just as inpractical as converting the file to UTF-8.

Link to comment
Share on other sites

HI boen_robot, Thanks a lot for the confirmation and your help.I have esclated to my PM to consider it as a Core bug on part of the product. I took much time to understand it.I would be communicate to the core product development team to convert the ANSI nature of the XML file generated and will ask them to convert it to UTF-8 encoding standards.Once again thanks a lot for your help to make me understand the problem. :)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...