XPath selection on tag contents

GrandiJoos · October 6, 2007

Hi,I have never used XPath before and could not find an example that comes close to my problem.I have the following XML file (a wikinews dump):

<mediawiki>  <page>	<title>Image:Wiki.png</title>	<id>1</id>	<restrictions>move=sysop:edit=sysop</restrictions>	<revision>	  <id>92375</id>	  <timestamp>2005-06-29T02:57:00Z</timestamp>	  <contributor>		<username>NGerda</username>		<id>2442</id>	  </contributor>	  <text xml:space="preserve">Wikinews logo.{{CopyrightByWikimedia}}</text>	</revision>  </page></mediawiki>

(with more <page> elements of course)I would like to only select those pages (the titles and text tags) where the title does not contain 'category:' or 'Image:' or 'Template:' etc. or just does not contain ':' (I do not have all the pages, only most of them).What would be the right XPath expression?Any help is greatly appreciated!GrandiJoos

boen_robot · October 6, 2007

If you're sure there isn't any <page/> with a real title containing ":", as in, a real article, you can use:

/mediawiki/page[contains(title,':')]

GrandiJoos · October 6, 2007

If you're sure there isn't any <page/> with a real title containing ":", as in, a real article, you can use:
/mediawiki/page[contains(title,':')]

And how do I test if the <title> does not contain a ':' and then select only the title and text terms?GrandiJoos

boen_robot · October 6, 2007

Opps... forgot we're searching negatives... simply put a not() wrap to onvert the match:

/mediawiki/page[not(contains(title,':'))]

Sign In

XPath selection on tag contents

Recommended Posts

GrandiJoos

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

GrandiJoos

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

Archived

Browse

Activity