Jump to content

XSLT and XPath problem


Garulfo
 Share

Recommended Posts

Hi all,I'm a newbie in XSLT and I'm having an hard time to learn it.I have .doc and .pdf files (actually, they are Curriculum Vitae) and I would like to convert them in structured XML files.I used the Ooo's tools to do a first approximately transformation and now, I would like to convert those badly written XML files into structured ones via an XSL filter.Here is an example of what I get :

<?xml version="1.0" encoding="UTF-8"?><article lang="fr-FR"><para/><informaltable frame="all"><tgroup cols="1.6470588235294117"><tbody><row><entry namest="c1" nameend="c3"><sect5><title/></sect5><sect5><title>MOIMOI</title><para/><para/><para><inlinegraphic fileref="" width=""/></para><para/><para/><para/><para/><para/></sect5><para/><para/><para><inlinegraphic fileref="" width=""/></para><para/><para/><para/><para/><para/></entry></row><row><entry namest="c1" nameend="c3"><para>COMPÉTENCES TECHNIQUES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c2"><para>Langages :</para></entry><entry><orderedlist><listitem><para>ADA, C, C++ (Rogue Wave, MFC), Java , HTML, XML, SQL, PL/SQL, Cobol, javascript, CGI-BIN, Script SHELL , Makefile</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>SGBD :</para></entry><entry><orderedlist><listitem><para>Oracle7,8i,9i (OWS 2.1, formation DB1, optimisation des requêtes applicatives, administration Oracle, Designer 2000), SQL Server 6.5,Teradata 2V5 NCR, Access, O2</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Gestion de Configuration :</para></entry><entry><orderedlist><listitem><para>Visual Source Safe, ClearCase, EQM</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Outils Décisionnels :</para></entry><entry><orderedlist><listitem><para>ETL : Informatica 5 et 6, Powercenter 7, FastLoad, FastExport (BTEQ), BusinessObjects (Designer, BO5i, WebI)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Ordonnanceur</para></entry><entry><orderedlist><listitem><para>CTR-M, MAESTRO, CA7</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Méthodes d’analyse :</para></entry><entry><orderedlist><listitem><para>Merise, OMT, UML (Rational Rose 2000), RMM (Relationship Management Methodology)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c2"><para>Systèmes d’exploitation :</para></entry><entry><orderedlist><listitem><para>Windows NT-2000, Unix (solaris 2.8, HP-UX)</para></listitem></orderedlist></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para>FORMATION ET LANGUES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry><para>2000</para></entry><entry namest="c2" nameend="c3"><para>Diplômé de Recherche en Technologie MIAGE</para></entry></row><row><entry><para>1998</para></entry><entry namest="c2" nameend="c3"><para>Maîtrise IUP MIAGE (Ingénieur-maître)</para></entry></row><row><entry><para>1996</para></entry><entry namest="c2" nameend="c3"><para>DUT Informatique</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry><para>Anglais</para></entry><entry namest="c2" nameend="c3"><para>lu, écrit, parlé</para></entry></row><row><entry><para>Espagnol</para></entry><entry namest="c2" nameend="c3"><para>notions</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para>COMPETENCES</para></entry></row><row><entry namest="c1" nameend="c3"><para><inlinegraphic fileref="" width=""/></para></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Administering users and content XI (2 jours)</para></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Administering Servers XI – Windows (2 jours)</para></entry></row><row><entry><para>2006</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Migration de Business Objects 5.x/6.x vers Business Objects XI (2 jours)</para></entry></row><row><entry><para>2002</para></entry><entry namest="c2" nameend="c3"><para>Business Objects : Designer (3 jours), Utilisateurs niveau 1 (2 jours)</para></entry></row><row><entry><para>2002</para></entry><entry namest="c2" nameend="c3"><para>Oracle France ORA "Optimisation des requêtes SQL applicatives" (3 jours)</para></entry></row><row><entry><para>2001</para></entry><entry namest="c2" nameend="c3"><para>Oracle France DBA-I "Architecture et Administration" (5 jours)</para></entry></row><row><entry><para/></entry><entry namest="c2" nameend="c3"><para/></entry></row><row><entry namest="c1" nameend="c3"><para/></entry></row></tbody></tgroup></informaltable></article>

Of course, I can't use this file in this form. So, I want to transform it in something "cleaner"Here is the XMLT script I wrote :

<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output method="xml"/><xsl:template match="/"><person><begFname><xsl:value-of select="substring(//entry[1],1,3)"/></debFname><br/><begLname><xsl:value-of select="substring(//entry[1],4,3)"/></debLname></person><br/><xsl:for-each select="//entry[@* and node()][para]"><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'TECHNIQUE')">														<techSkills><xsl:value-of select="para"/><xsl:if test="string-length(.)!=0">														<xsl:value-of select="following-sibling"/></xsl:if></techSkills><br/>				</xsl:if><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'FORMATION')"><formations><xsl:value-of select="para"/></formations><br/>				</xsl:if><xsl:if test="contains(translate(normalize-space(.),'a,b,c,d,e,é,è,É,È,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z','A,B,C,D,E,E,E,E,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z'),'EXPERIENCE PROFESSIONNELLE')"><experiences><xsl:value-of select="para"/></experiences><br/>				</xsl:if>			</xsl:for-each></xsl:template></xsl:stylesheet>

With this, I can extract the title of the major parts of the CV. Now, I would like extract the non-empty <para> elements which are related to each major parts.However, as you can see, the source files are not structured and I don't see how I could navigate from (for example) the "<para>COMPÉTENCES TECHNIQUES</para>" node to the "<para>ADA, C, C++ (Rogue Wave, MFC), Java , HTML, XML, SQL, PL/SQL, Cobol, javascript, CGI-BIN, Script SHELL , Makefile</para>" node.Do you see a way to do that with XPath ?Thanks in advance.Regards,Fabien.

Link to comment
Share on other sites

Before getting deep into your XSLT, I think I should try to give you a basic XPath way to navigate:

<xsl:for-each select="//para">  <xsl:if test="para!= ' ' ">  <para>    <xsl:value-of select="para" />  </para>  </xsl:if></xsl:for-each>

This simply means that if para has any content, it would be outputted inside a para element, otherwise it's skiped. This is a simple kind of filter which I haven't tested, but I have seen it before in other XSLT stylesheets, so it should work.I'm not completely sure if that was what you were asking for, so if not, I guess I'll take it deeper :) .

Link to comment
Share on other sites

Hi boen_robot,Thanks for your answer.I tested your example but it didn't work even though it seems to be ok.It's the "<xsl:if test="para!= ' ' ">" which seems to be the problem.I tryed things like "<xsl:if test="string-length(.) !=0">", but it doesn't change anything.The only things I found that gives me result are things like this :

<xsl:for-each select="//entry[para]"><xsl:if test="contains('EXAMPLE')">  <para><xsl:value-of select="para"/></para>  </xsl:if>

But that doesn't solve my problem since I don't know how to access others nodes from there.I'm not sure if it's clear. Do you want I post a more concret example of what I would like to do ?

Link to comment
Share on other sites

What if you replace the XPath expression in the string-lenght() with the current() function?

<xsl:if test="string-lenght(current())!=0">

Would this work?And yeah, a more concret example of imput and desired output would make things easier.

Link to comment
Share on other sites

In fact, it was due to the "select=//para" clause.With "//*[para]" instead, it works.However, once in the for-each, I not sure how I will navigate through the nodes.I will try to advance, and I will post later a more concret example of the problems I encounter.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...