Jump to content

XPath selection on tag contents


GrandiJoos
 Share

Recommended Posts

Hi,I have never used XPath before and could not find an example that comes close to my problem.I have the following XML file (a wikinews dump):

<mediawiki>  <page>	<title>Image:Wiki.png</title>	<id>1</id>	<restrictions>move=sysop:edit=sysop</restrictions>	<revision>	  <id>92375</id>	  <timestamp>2005-06-29T02:57:00Z</timestamp>	  <contributor>		<username>NGerda</username>		<id>2442</id>	  </contributor>	  <text xml:space="preserve">Wikinews logo.{{CopyrightByWikimedia}}</text>	</revision>  </page></mediawiki>

(with more <page> elements of course)I would like to only select those pages (the titles and text tags) where the title does not contain 'category:' or 'Image:' or 'Template:' etc. or just does not contain ':' (I do not have all the pages, only most of them).What would be the right XPath expression?Any help is greatly appreciated!GrandiJoos

Link to comment
Share on other sites

If you're sure there isn't any <page/> with a real title containing ":", as in, a real article, you can use:

/mediawiki/page[contains(title,':')]

Link to comment
Share on other sites

Opps... forgot we're searching negatives... simply put a not() wrap to onvert the match:

/mediawiki/page[not(contains(title,':'))]

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...