Jump to content
Sign in to follow this  
javanewbie22

Best way to parse odd xml file?

Recommended Posts

Dear Experts,I'm trying to parse an odd XML file that looks like this:<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plist version="1.0"><dict><key>Major Version</key><integer>1</integer><key>Minor Version</key><integer>1</integer><key>Application Version</key><string>9.2.1</string><key>Features</key><integer>5</integer><key>Show Content Ratings</key><true/><key>Library Persistent ID</key><string>9210E514389C76D3</string><key>Tracks</key><dict><key>732</key><dict><key>Track ID</key><integer>732</integer><key>Name</key><string>Tears All Over Town</string><key>Artist</key><string>A Girl Called Eddy</string><key>Album Artist</key><string>A Girl Called Eddy</string>...</dict><key>760</key><dict><key>Track ID</key><integer>760</integer><key>Name</key><string>The Nightingale</string><key>Artist</key><string>Angelo Badalamenti</string><key>Album</key><string>Twin Peaks</string><key>Genre</key><string>Soundtrack</string><key>Kind</key><string>MPEG audio file</string><key>Size</key><integer>7109800</integer><key>Total Time</key><integer>296150</integer><key>Disc Number</key><integer>1</integer><key>Disc Count</key><integer>1</integer><key>Track Number</key><integer>4</integer><key>Track Count</key><integer>11</integer><key>Year</key><integer>1990</integer><key>Date Modified</key><date>2006-01-04T06:51:56Z</date><key>Date Added</key><date>2009-04-02T02:04:11Z</date><key>Bit Rate</key><integer>192</integer><key>Sample Rate</key><integer>44100</integer><key>Play Count</key><integer>4</integer><key>Play Date</key><integer>3493240930</integer><key>Play Date UTC</key><date>2014-09-11T05:42:10Z</date><key>Sort Name</key><string>Nightingale</string><key>Persistent ID</key><string>9B002DC50111B620</string><key>Track Type</key><string>File</string><key>File Folder Count</key><integer>4</integer><key>Library Folder Count</key><integer>1</integer></dict><key>762</key>...Every field in the Itunes format has <key> for the fieldname, and the <datatype> surrounding the value.This looks a lot different from the xml format that I saw in this tutorial.http://www.stat.purdue.edu/~mdw/490M/STAT490Mday29.txt<CD><TITLE>Picture book</TITLE><ARTIST>Simply Red</ARTIST><COUNTRY>EU</COUNTRY><COMPANY>Elektra</COMPANY><PRICE>7.20</PRICE><YEAR>1985</YEAR></CD>First I experimented using grep on cygwin. But I don't know how to get a specific set of three lines out of the grouping with grep.With R, I was trying to parse the file, until I looked and saw that the format was different from all the examples.This post made it look easy:How can I export my iTunes library information to a spreadsheet?https://www.quora.com/How-can-I-export-my-iTunes-library-information-to-a-spreadsheetBut it completely failed. The xml doc got imported into Open Office as a document. Not a spreadsheet.For each CD, I want to extract the Artist, Album, and Composer (if it's there)Questions:Is there a name for this type of XML file?What would be the best tool and way to parse the info from the file?grep on cygwinRPythonJavaXQuerysomething else???Thanks a lot!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

×
×
  • Create New...