Jump to content

Reporting Component (database To Xsl To Pdf)


behrk2

Recommended Posts

Hey everyone,I am going to be developing a reporting component that will report on data from a relational database. Reporting is a big thing, naturally. So I want to make sure I do it right. I have been doing a lot of research, and I want to see what you all think of the following approach:I am running a PostgreSQL 8.3.7 database. For now, I am going to focus on the back-end, and assume that I have been given some report specification to follow.First, I need to convert the contents of the database (based on the report specification) to XML. I figure I can do this by using the XML functions within PostgreSQL.Then, I want to use XSL to both transform and format the XML. Now, I definitely want to produce a .PDF report, but I also want to have the flexibility to develop an HTML report, too. So, for the HTML, I can take the resulting XSL document and attach a CSS stylesheet to it. I could also then convert the HTML to PDF.Alternatively, for the PDF, I could use an XSL-FO Processor to output a PDF report. I want to use an open-source processor. The main one that I found was Apache FOP. Here is a flow diagram for clarification: diagram.pngWhat do you guys think? Does anyone suggest a different method? I'm hoping to make this reporting component as good as possible, I want it to be "industry standard".Thanks!

Link to comment
Share on other sites

Well, if PostgreSQL doesn't have a direct (X)HTML report functionality, then using XSLT on the XML sounds fine to me.I don't like the fact that you're using XSLT to output another XSLT though. I hope that's a "typo" in your diagram... there's really no need to create XSLT using XSLT.Whether or not to use XSL-FO is something I've always wondered about... the question really is how much control you need over the resulting PDF, and how different it must be from the (X)HTML.If the answer to both questions is "a lot", then XSL-FO is a must. In other words, create two XSLT's - one for "XML to XHTML", and the other for "XML to XSL-FO". Use Apache FOP on the resulting XSL-FO.Otherwise, "HTML to PDF" is probably easier and better. In other words, create one XSLT - for "XML to XHTML", then use an "(X)HTML to PDF" converter for the PDF. Apache FOP is not the program for that one though. Depending on the environment in which you'll execute it all, there will be different solutions. For PHP, there's DOMPDF for example.You could in theory create an "XHTML to XSL-FO" XSLT stylesheet, and then use Apache FOP, but creating this XSLT is going to be a big pain for sure.

Link to comment
Share on other sites

I can't speak for PHP, but C# struggles significantly in parsing XML into an object when there are more than, say, 50,000 elements in the document. If you're ever going to do any PDF generation of recordsets which contain tens of thousands of records, you might want to look for an alternative approach.

Link to comment
Share on other sites

Thanks for your reply and your help, boen_robot. What you said makes sense. It seems that I am still having trouble getting used to all of the terms and understanding some things. I understand what you said though.Jesh - I'll also keep in mind the possibility of having that much data, thanks!I have "refined" my plan to the following:1.) Based on a report specification from a user, generate SQL queries (in PostgreSQL) that will return an XML document. I still have to think of the best way to do this, as my database follows the Master-detail philosophy, so I'm not sure if the PostgreSQL XML functions are enough to capture the aggregation and the details. I need some "extended SQL" that will allow for forward and reverse paths.2.) Once I have the XML, I need to "process" it with an XSLT Stylesheet. I will process it with PHP. My XSLT Stylesheet will need to contain both transformations and formatting (XSL-FO). I've decided to go down that "route", as I need a lot of control over the PDF. Now, here is my next question. I want to first come up with a standard report template, or XSLT Stylesheet. So, how can I generate that stylesheet based on what may be a different XML document every time? In other words, that stylesheet will have to be constructed based upon the type/names/amount of data in the XML document. Does that make sense? I'm really hung up on how to do that.3.) Once I process the XML and the XSLT Stylesheet, I will take the result of that and feed it into Apache FOP, which will give me my PDF.Can anyone offer some insight on the above points?Thanks!

Link to comment
Share on other sites

So... you don't want (X)HTML report (only a PDF one) or the (X)HTML report will be with a totally different look? If so, then your plan sounds fine.1.) I'm not at all familiar with PostgreSQL, so I can't help you with the SQL statements.2.) Even though I'm not familiar with PostgreSQL, I'm sure that the generated XML will have the same elements regardless of the database schema. XSLT is a template based (a.k.a. "event based") language, so it should be relatively easy to map one (or several) elements to a single action (e.g. a creation of a "fo:table-cell" element with the value of the element). As long as the XML has consistent element names (as should be the case here), it should all be relatively easy.3.) Good. Just keep in mind that calling FOP from PHP is costly, no matter how you call it (command line or JavaBridge).Also, to the DOM point, unfortunately the same goes for PHP and any DOM implementation. If the generated XML is simple enough, you could use PHP's XMLReader or use a streamable XSLT stylesheet with SAXON (and you'd invoke SAXON from PHP with something like the XML_XSLT2Processor from my signature). The idea in both cases is that you'll read the XML node by node, not the whole document.Since SAXON and FOP are both JAVA applications, using JAVA now probably sounds as a much better solution - in JAVA, you could easily use a streamable XSLT stylesheet, pass the output directly to FOP, and FOP would then generate the PDF... all in parallel, not as a sequence, as would be the case in PHP. Still, that would only be possible if PostgreSQL generates a simple enough XML document.

Link to comment
Share on other sites

So... you don't want (X)HTML report (only a PDF one) or the (X)HTML report will be with a totally different look? If so, then your plan sounds fine.1.) I'm not at all familiar with PostgreSQL, so I can't help you with the SQL statements.2.) Even though I'm not familiar with PostgreSQL, I'm sure that the generated XML will have the same elements regardless of the database schema. XSLT is a template based (a.k.a. "event based") language, so it should be relatively easy to map one (or several) elements to a single action (e.g. a creation of a "fo:table-cell" element with the value of the element). As long as the XML has consistent element names (as should be the case here), it should all be relatively easy.3.) Good. Just keep in mind that calling FOP from PHP is costly, no matter how you call it (command line or JavaBridge).Also, to the DOM point, unfortunately the same goes for PHP and any DOM implementation. If the generated XML is simple enough, you could use PHP's XMLReader or use a streamable XSLT stylesheet with SAXON (and you'd invoke SAXON from PHP with something like the XML_XSLT2Processor from my signature). The idea in both cases is that you'll read the XML node by node, not the whole document.Since SAXON and FOP are both JAVA applications, using JAVA now probably sounds as a much better solution - in JAVA, you could easily use a streamable XSLT stylesheet, pass the output directly to FOP, and FOP would then generate the PDF... all in parallel, not as a sequence, as would be the case in PHP. Still, that would only be possible if PostgreSQL generates a simple enough XML document.
Thanks so much for your help, boen_robot, I really appreciate your input! You helped me to understand the big picture a lot better.Thanks again!
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...