marco perez 0 Posted February 23, 2008 Report Share Posted February 23, 2008 I have a doubt about the use of JDOM parsing a xml document. The outcome is not what I expect..I did the next program to parse a xml document. I have considered that the root of the document id the element bodyimport org.jdom.*;import org.jdom.input.SAXBuilder;import org.jdom.output.*;import java.io.File;import java.io.IOException;import java.util.*;public class Ex04 { public static void main(String[] args) { String filename = "Test.xml"; SAXBuilder b = new SAXBuilder(); try { Document doc = b.build(new File(filename)); Element root = doc.getRootElement(); Element body = root.getChild("body"); bodyExtract(body); } // indicates a well-formedness error catch (JDOMException e) { System.out.println(args[0] + " is not well-formed."); System.out.println(e.getMessage()); } catch (IOException e) { System.out.println(e); } } public static void bodyExtract(Element current) { String aaa = current.getText(); List children = current.getChildren(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); bodyExtract(child); } }}#######################################################################Part of the original Test.xml file is:...<body> The <a href=" http://www.linux.org/">Linux</a> is na open-source operating system, created by <a href=" http://technorati.com/tag/linus-torvals">Linus Torvalds</a> in the 80’s. ...The output of the program above is:The is an open-source operating system, created by in the 80’s. LinuxLinus TorvaldsI want to analyze semantically the sentences. Thus I need that the output is something like this:The Linux is an open-source operating system, created by Linus Torvaldsin the 80’s. How can I solve this problem,Thanx for your helpMP Quote Link to post Share on other sites
Reg Edit 0 Posted February 23, 2008 Report Share Posted February 23, 2008 I don't know about JDOM, but to embed html in xml you have to take special action or the xml is not well-formed. Your solution probably involves the use of CDATA and/or replacing "<" and ">"; please do a quick search about this to understand the issue and decide on a strategy that suits your JDOM world. For instance see:http://biglist.com/lists/lists.mulberrytec...8/msg00144.htmlgoogle Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.