Garulfo Posted March 17, 2006 Share Posted March 17, 2006 Hi all,I'm a newbie in XSLT and I'm having an hard time to learn it.I have .doc and .pdf files (actually, they are Curriculum Vitae) and I would like to convert them in structured XML files.I used the Ooo's tools to do a first approximately transformation and now, I would like to convert those badly written XML files into structured ones via an XSL filter.Here is an example of what I get : <?xml version="1.0" encoding="UTF-8"?><article lang="fr-FR"><para/><informaltable frame="all"><tgroup cols="1.6470588235294117"><tbody><row><entry namest="c1" nameend="c3"><sect5><title/></sect5><sect5><title>MOIMOI</title><para/><para/><para><inlinegraphic fileref="" width=""/></para><para/><para/><para/><para/><para/></sect5><para/><para/><para><inlinegraphic fileref="" width=""/></para><para/><para/><para/><para/><para/></entry></row><row><entry namest="c1" nameend="c3"><para>COMPÉTENCES TECHNIQUES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c2"><para>Langages :</para></entry><entry><orderedlist><listitem><para>ADA, C, C++ (Rogue Wave, MFC), Java , HTML, XML, SQL, PL/SQL, Cobol, javascript, CGI-BIN, Script SHELL , Makefile</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>SGBD :</para></entry><entry><orderedlist><listitem><para>Oracle7,8i,9i (OWS 2.1, formation DB1, optimisation des requêtes applicatives, administration Oracle, Designer 2000), SQL Server 6.5,Teradata 2V5 NCR, Access, O2</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Gestion de Configuration :</para></entry><entry><orderedlist><listitem><para>Visual Source Safe, ClearCase, EQM</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Outils Décisionnels :</para></entry><entry><orderedlist><listitem><para>ETL : Informatica 5 et 6, Powercenter 7, FastLoad, FastExport (BTEQ), BusinessObjects (Designer, BO5i, WebI)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Ordonnanceur</para></entry><entry><orderedlist><listitem><para>CTR-M, MAESTRO, CA7</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Méthodes d’analyse :</para></entry><entry><orderedlist><listitem><para>Merise, OMT, UML (Rational Rose 2000), RMM (Relationship Management Methodology)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Systèmes d’exploitation :</para></entry><entry><orderedlist><listitem><para>Windows NT-2000, Unix (solaris 2.8, HP-UX)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para>FORMATION ET LANGUES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry><para>2000</para></entry><entry namest="c2" nameend="c3"><para>Diplômé de Recherche en Technologie MIAGE</para></entry></row><row><entry><para>1998</para></entry><entry namest="c2" nameend="c3"><para>Maîtrise IUP MIAGE (Ingénieur-maître)</para></entry></row><row><entry><para>1996</para></entry><entry namest="c2" nameend="c3"><para>DUT Informatique</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry><para>Anglais</para></entry><entry namest="c2" nameend="c3"><para>lu, écrit, parlé</para></entry></row><row><entry><para>Espagnol</para></entry><entry namest="c2" nameend="c3"><para>notions</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para>COMPETENCES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Administering users and content XI (2 jours)</para></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Administering Servers XI – Windows (2 jours)</para></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Migration de Business Objects 5.x/6.x vers Business Objects XI (2 jours)</para></entry></row><row><entry><para>2002</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Designer (3 jours), Utilisateurs niveau 1 (2 jours)</para></entry></row><row><entry><para>2002</para></entry><entry namest="c2" nameend="c3"><para>Oracle France ORA "Optimisation des requêtes SQL applicatives" (3 jours)</para></entry></row><row><entry><para>2001</para></entry><entry namest="c2" nameend="c3"><para>Oracle France DBA-I "Architecture et Administration" (5 jours)</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row></tbody></tgroup></informaltable></article> Of course, I can't use this file in this form. So, I want to transform it in something "cleaner"Here is the XMLT script I wrote : <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output method="xml"/><xsl:template match="/"><person><begFname><xsl:value-of select="substring(//entry[1],1,3)"/></debFname><br/><begLname><xsl:value-of select="substring(//entry[1],4,3)"/></debLname></person><br/><xsl:for-each select="//entry[@* and node()][para]"><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'TECHNIQUE')"> <techSkills><xsl:value-of select="para"/><xsl:if test="string-length(.)!=0"> <xsl:value-of select="following-sibling"/></xsl:if></techSkills><br/> </xsl:if><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'FORMATION')"><formations><xsl:value-of select="para"/></formations><br/> </xsl:if><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'EXPERIENCE PROFESSIONNELLE')"><experiences><xsl:value-of select="para"/></experiences><br/> </xsl:if> </xsl:for-each></xsl:template></xsl:stylesheet> With this, I can extract the title of the major parts of the CV. Now, I would like extract the non-empty <para> elements which are related to each major parts.However, as you can see, the source files are not structured and I don't see how I could navigate from (for example) the "<para>COMPÉTENCES TECHNIQUES</para>" node to the "<para>ADA, C, C++ (Rogue Wave, MFC), Java , HTML, XML, SQL, PL/SQL, Cobol, javascript, CGI-BIN, Script SHELL , Makefile</para>" node.Do you see a way to do that with XPath ?Thanks in advance.Regards,Fabien. Link to comment Share on other sites More sharing options...
boen_robot Posted March 17, 2006 Share Posted March 17, 2006 Before getting deep into your XSLT, I think I should try to give you a basic XPath way to navigate: <xsl:for-each select="//para"> <xsl:if test="para!= ' ' "> <para> <xsl:value-of select="para" /> </para> </xsl:if></xsl:for-each> This simply means that if para has any content, it would be outputted inside a para element, otherwise it's skiped. This is a simple kind of filter which I haven't tested, but I have seen it before in other XSLT stylesheets, so it should work.I'm not completely sure if that was what you were asking for, so if not, I guess I'll take it deeper . Link to comment Share on other sites More sharing options...
Garulfo Posted March 17, 2006 Author Share Posted March 17, 2006 Hi boen_robot,Thanks for your answer.I tested your example but it didn't work even though it seems to be ok.It's the "<xsl:if test="para!= ' ' ">" which seems to be the problem.I tryed things like "<xsl:if test="string-length(.) !=0">", but it doesn't change anything.The only things I found that gives me result are things like this : <xsl:for-each select="//entry[para]"><xsl:if test="contains('EXAMPLE')"> <para><xsl:value-of select="para"/></para> </xsl:if> But that doesn't solve my problem since I don't know how to access others nodes from there.I'm not sure if it's clear. Do you want I post a more concret example of what I would like to do ? Link to comment Share on other sites More sharing options...
boen_robot Posted March 17, 2006 Share Posted March 17, 2006 What if you replace the XPath expression in the string-lenght() with the current() function? <xsl:if test="string-lenght(current())!=0"> Would this work?And yeah, a more concret example of imput and desired output would make things easier. Link to comment Share on other sites More sharing options...
Garulfo Posted March 17, 2006 Author Share Posted March 17, 2006 In fact, it was due to the "select=//para" clause.With "//*[para]" instead, it works.However, once in the for-each, I not sure how I will navigate through the nodes.I will try to advance, and I will post later a more concret example of the problems I encounter. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now