marco perez Posted February 12, 2008 Share Posted February 12, 2008 Hi,I have to do a Java program to create an HTML element tree structure of any HTML document. Someone suggest me JDOM, but it seems that this api demands that I know previously the structure of the XML (I'd have to convert to xml)Do you advice me any other api or approach to from any html document, save its structure and accede to any elementThankxMP Link to comment Share on other sites More sharing options...
boen_robot Posted February 12, 2008 Share Posted February 12, 2008 Any other API demands pretty much the same thing - converting the page to XHTML, and thus - XML. XML is easily processable, so there are lots of APIs for it. SGML, and therefore HTML, is not easily processable. There are very few parsers for it, and most of the ones that are out convert HTML to XHTML anyway.You can use Tidy to convert your HTML pages to XHTML ones. From then on, JDOM, SAX, and a few more that don't come into mind, are all APIs with which you can parse your XHTML document. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.