Jump to content

VERY INTERESTING, FIND ERRORS IN XML FILE


Ajmal

Recommended Posts

<?xml version="1.0" encoding="ISO-8859-1"?><Note><from>Ahmad</from><to>Yasir</to><body>hello yasir.</Note>Above is my xml file which misses the ending tag of <body>.i want a php program that will detect this error and also fix the error and place the ending tag.can someone help me?

Link to comment
Share on other sites

XML was defined on the premise of not allowing this.PHP can report to you the reason for the error, but it can't fix the error, as it can't know for sure what you had in mind, since there are no semantics in your code for it to be guided by.In other words, your XML document may be fixed with

<?xml version="1.0" encoding="ISO-8859-1"?><Note><from>Ahmad</from><to>Yasir</to><body></body>hello yasir.</Note>

or

<?xml version="1.0" encoding="ISO-8859-1"?><Note><from>Ahmad</from><to>Yasir</to><body>hello yasir.</body></Note>

Now, you know you want the second one when you look at it, because you know the semantics of "body"... but PHP doesn't. For PHP, both fixes would be good.You need to write such a program yourself if you REALLY want it, but be warned that you shouldn't.A basic algoritm to work with is to check what the parser errors are, remove everything after the first point of error (but save it), write out the temporary file with XMLWriter but use EndDocument() to end it properly, enclose the remains in a dummy root and repeat... once everything is parsed successfully, take back the last remains, reparse the previous stuff, insert the last remains at the appropriate place from the previous stuff, repeat until you reach the initial state.This kind of repair may not necesarily produce the results you want (that's just it... no repair would... that's why XML doesn't allow it), but it will produce a parsable document.

Link to comment
Share on other sites

Actually, PHP can give it a good try, but I would not count on it correcting all errors. Consider:

<?php	$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><Note><from>Ahmad</from><to>Yasir</to><body>hello yasir.</Note>';	$doc = new DOMDocument();	$doc->recover = true; // this line corrects the broken XML	$doc->loadXML($xml);	header("Content-type: text/xml");	echo $doc->saveXML();?>

That code does exactly what you want. (The header and echo statements are only for demonstration.)But even a simple error like the following cannot be corrected using this technique:

<?xml version="1.0" encoding="ISO-8859-1"?><Note>	  Ahmad   </from>   <to>	  Yasir   </to></Note>

Link to comment
Share on other sites

Many thanks to boen_robot and Deirdre's Dad.Deirdre's Dad code are work for me. My next step is to correct the attributes of tags.<?xml version="1.0" encoding="ISO-8859-1"?><Note><from id=5 name=test>Ahmad</from><to>Yasir</to></Note>How i correct these attributes.i.e.<from id="5" name="test">Ahmad</from>Help will be appreciated.Regards,Ajmal

Link to comment
Share on other sites

It seems the recover property Deirdre's Dad shows follows the basic algoritm I outlined to you (well... a very complicated variation of it most likely, but still...). If it doesn't already do the attribute recovery in the way you like, you'll have to actually implement this or another algoritm yourself for this.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...