Jump to content

Automated removal of nodes based upon sub-node data


PuppyOnTheRadio

Recommended Posts

My XML knowledge is fairly terrible, but I have an XML issue that someone may have an easy option for processing this. I have an XML document that maps kinda like this:

 <mediawiki>  <siteinfo>	<sitename></sitename>	<base></base>	<generator></generator>	<case></case>	<namespaces>	  <namespace></namespace>	</namespaces>  </siteinfo>  <page>	<title></title>	<id></id>	<revision>	  <id></id>	  <timestamp>2011-03-23T11:55:10Z</timestamp>	  <contributor>		<username></username>		<id></id>	  </contributor>	  <comment></comment>	  <text></text>	</revision>	<revision>	  <id></id>	  <timestamp>2011-03-23T11:57:00Z</timestamp>	  <contributor>		<username></username>		<id></id>	  </contributor>	  <comment></comment>	  <text></text>	</revision>  </page></mediawiki>

So what I was wanting to do is remove the <revision></revision> based upon the date in the <timestamp></timestamp> for each revision instance. (A particular page will have numerous revisions but each revision has a unique timestamp). The timestamp is always in that format but to keep it simple I was just going to sweep everything pre 2010 from the XML. The second thing I was wanting to do was split the file so that each <page></page> was in individual xml files. (That's a nice to have rather than a have As we're looking at excess of 10,000 pages with some pages having excess of 100 revisions, obviously going through the XML manually to do this is a bit of a nightmare. What I was looking for was a way to do this via automation. Does anyone know of an effective way to do this?

Link to comment
Share on other sites

you have to use any server side language to manupulate that file. are you using any?

Link to comment
Share on other sites

You can use getElementsByTagName() to get a list of <revision> elements and then check the <timestamp> element of each one. I'm imagining PHP for this, but any server-side language that has XML DOM implemented would do.

Link to comment
Share on other sites

Sorry - I should have been clearer. I have no access to server side script at all. I can simply upload the XML doc, but due to a sloooooow sever, large XML files have a tendency to time out on upload. Hence I can download big files, but upload small ones. SoI was looking for something to manipulate the XML prior to upload.I have XML notepad, but that has no macro functionality (as far as I can see). I can manipulate it using VBA through excel/Word, but MS has a tendency to do UGLY stuff to XML files when saving.My next option is writing a local VBS which will go through and do the manipulation for me, but I'm not as skilled at VBS as I'd like to be. So I was really looking for a local XML editor that had automation included, or could be included fairly easily.I'm assuming the only nodes I really need are the main node, the page and it's sub-nodes. The site and it's sub-nodes can probably be trimmed. And there is a potential of generating one page at a time XML, so my "nice to have" can be worked around manually.Hope that clears up the question.

Link to comment
Share on other sites

If you don't have access to a server-side language you're going to have to manipulate the files manually. Either that, or program a desktop application to do it. This doesn't sound like the way a site should be run. Why can't you use a server-side language?

  • Like 1
Link to comment
Share on other sites

This doesn't sound like the way a site should be run. Why can't you use a server-side language?
It's a wiki. And I'm not the site owner, just contributor/administrator. And given the content needed is around 30% of (estimated) 10GB, I don't really want to manually edit the files. Which is why I was looking for a desktop XML editor with macro functionality. Or a VBS approach, which it looks like I might have to go for.
Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...