Jump to content

XPath selection on tag contents


GrandiJoos

Recommended Posts

Hi,I have never used XPath before and could not find an example that comes close to my problem.I have the following XML file (a wikinews dump):

<mediawiki>  <page>	<title>Image:Wiki.png</title>	<id>1</id>	<restrictions>move=sysop:edit=sysop</restrictions>	<revision>	  <id>92375</id>	  <timestamp>2005-06-29T02:57:00Z</timestamp>	  <contributor>		<username>NGerda</username>		<id>2442</id>	  </contributor>	  <text xml:space="preserve">Wikinews logo.{{CopyrightByWikimedia}}</text>	</revision>  </page></mediawiki>

(with more <page> elements of course)I would like to only select those pages (the titles and text tags) where the title does not contain 'category:' or 'Image:' or 'Template:' etc. or just does not contain ':' (I do not have all the pages, only most of them).What would be the right XPath expression?Any help is greatly appreciated!GrandiJoos

Link to comment
Share on other sites

If you're sure there isn't any <page/> with a real title containing ":", as in, a real article, you can use:

/mediawiki/page[contains(title,':')]

Link to comment
Share on other sites

If you're sure there isn't any <page/> with a real title containing ":", as in, a real article, you can use:
/mediawiki/page[contains(title,':')]

And how do I test if the <title> does not contain a ':' and then select only the title and text terms?GrandiJoos :)
Link to comment
Share on other sites

Opps... forgot we're searching negatives... simply put a not() wrap to onvert the match:

/mediawiki/page[not(contains(title,':'))]

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...