Jump to content
PuppyOnTheRadio

Automated removal of nodes based upon sub-node data

Recommended Posts

My XML knowledge is fairly terrible, but I have an XML issue that someone may have an easy option for processing this. I have an XML document that maps kinda like this:

 <mediawiki>  <siteinfo>	<sitename></sitename>	<base></base>	<generator></generator>	<case></case>	<namespaces>	  <namespace></namespace>	</namespaces>  </siteinfo>  <page>	<title></title>	<id></id>	<revision>	  <id></id>	  <timestamp>2011-03-23T11:55:10Z</timestamp>	  <contributor>		<username></username>		<id></id>	  </contributor>	  <comment></comment>	  <text></text>	</revision>	<revision>	  <id></id>	  <timestamp>2011-03-23T11:57:00Z</timestamp>	  <contributor>		<username></username>		<id></id>	  </contributor>	  <comment></comment>	  <text></text>	</revision>  </page></mediawiki>

So what I was wanting to do is remove the <revision></revision> based upon the date in the <timestamp></timestamp> for each revision instance. (A particular page will have numerous revisions but each revision has a unique timestamp). The timestamp is always in that format but to keep it simple I was just going to sweep everything pre 2010 from the XML. The second thing I was wanting to do was split the file so that each <page></page> was in individual xml files. (That's a nice to have rather than a have As we're looking at excess of 10,000 pages with some pages having excess of 100 revisions, obviously going through the XML manually to do this is a bit of a nightmare. What I was looking for was a way to do this via automation. Does anyone know of an effective way to do this?

Share this post


Link to post
Share on other sites

you have to use any server side language to manupulate that file. are you using any?

Share this post


Link to post
Share on other sites

You can use getElementsByTagName() to get a list of <revision> elements and then check the <timestamp> element of each one. I'm imagining PHP for this, but any server-side language that has XML DOM implemented would do.

Share this post


Link to post
Share on other sites

You may write it with XSLT but it requires a bit learning. Also you should learn DOM with your favorite language : 1. parse you document2. go to the dom node and remove it3. write the result document

  • Like 1

Share this post


Link to post
Share on other sites

Sorry - I should have been clearer. I have no access to server side script at all. I can simply upload the XML doc, but due to a sloooooow sever, large XML files have a tendency to time out on upload. Hence I can download big files, but upload small ones. SoI was looking for something to manipulate the XML prior to upload.I have XML notepad, but that has no macro functionality (as far as I can see). I can manipulate it using VBA through excel/Word, but MS has a tendency to do UGLY stuff to XML files when saving.My next option is writing a local VBS which will go through and do the manipulation for me, but I'm not as skilled at VBS as I'd like to be. So I was really looking for a local XML editor that had automation included, or could be included fairly easily.I'm assuming the only nodes I really need are the main node, the page and it's sub-nodes. The site and it's sub-nodes can probably be trimmed. And there is a potential of generating one page at a time XML, so my "nice to have" can be worked around manually.Hope that clears up the question.

Share this post


Link to post
Share on other sites

If you don't have access to a server-side language you're going to have to manipulate the files manually. Either that, or program a desktop application to do it. This doesn't sound like the way a site should be run. Why can't you use a server-side language?

  • Like 1

Share this post


Link to post
Share on other sites
This doesn't sound like the way a site should be run. Why can't you use a server-side language?
It's a wiki. And I'm not the site owner, just contributor/administrator. And given the content needed is around 30% of (estimated) 10GB, I don't really want to manually edit the files. Which is why I was looking for a desktop XML editor with macro functionality. Or a VBS approach, which it looks like I might have to go for.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...