Jump to content

Best Way To Generate Complex XML Doc From Spreadsheet - Help!


Recommended Posts

Hello! So, first off I'm not really a "real" programmer. I handle a lot of tech stuff at the company I'm with though, and I'm pretty good at dabbling. I usually figure out how to do what I need to do when I need to do it, and I rarely can't figure things out! I know plenty of HTML, how to work with CSS style sheets, all kinds of fancy Excel formulas, a bit of XML etc. (Note there are a few tweaks to this post that might not seem to "flow well" with the rest as I've posted it elsewhere first and didn't get much back, but I have figured our a few more things) Sooo, the thing is, I now kind of need to figure out something new, and it's a little more in depth than other things I've tried to do in the past and I'm not sure the best way to go about it. So here's the score. I work for a record label, and we provide metadata for album releases to numerous sites in XML. We've been doing this for years, and the way most XML files display in Grid mode in Stylus has made it so easy to just copy and paste from Excel I've never seen the need to figure out a "proper" or more automated system. In the long haul I probably should have because it would have saved time in the long run, but we really custom tailor our XML to each sites system to optimize it, which requires a human touch anyway. Enter a new site we're delivering directly to. Their XML is very complicated and lethargic. There's a lot of duplicated data in different tags that shouldn't really exist, but are required. It's to the point of ridiculousness really. It'd be waaay too much work to manually enter things. So I started looking for a solution to automate it. I have kind of found a way. Basically I used Stylus to convert a CSV for an album to XML that looks like this <root> <fieldnames> <UPC>UPC</UPC> <catalog>Catalog Number</catalog> <tracknumber>Track Number</tracknumber> <release>Release Date</release> <albumtitle>Title</albumtitle> <artist>Artist</artist> <tracktitle>Track Title</tracktitle> <isrc>ISRC USM2U</isrc> <description>Description</description> <time1>Track Times</time1> <field>Source Master</field> <field>Notes</field> <field>Song Writers</field> </fieldnames> <Row> <UPC>XXXXXXXXX</UPC> <catalog> CD119</catalog> <tracknumber>01</tracknumber> <release>4/10/2003</release> <albumtitle>XXXXXXXXXX</albumtitle> <artist>YYYYYYYYYYYYY</artist> <tracktitle>ZZZZZZZZZZZZ</tracktitle> <isrc>XXXXXXXXXX</isrc> <time1>2:49</time1> <time2>0:02:49</time2> <time3>169</time3> <field>169000</field> </Row> Then that repeats for all tracks Then I found Xquery code { let $doc := . for $v in $doc//root/Row, $a in $doc//root/Row/tracknumber/text() where matches($a, '01') and $v/tracknumber/text() = $a return $v/UPC/text() } Which can pull data from whatever field based off of the track number. This would allow it to pull the track name, track time, etc on a per track basis. So in essence I could setup <File> <FileName>{ let $doc := . for $v in $doc//root/Row, $a in $doc//root/Row/tracknumber/text() where matches($a, '2') and $v/tracknumber/text() = $a return $v/UPC/text() }_01_01_FLAC.flac</FileName> <FilePath>resources</FilePath> <HashSum> <HashSum>CHECKSUM SONG 01</HashSum> <HashSumAlgorithmType>MD5</HashSumAlgorithmType> </HashSum> </File> Etc for each field that would pull the appropriate data in a query that features the actual XML in there as well, so the end result of a full query would be a valid XML document. The downside is that I have to have a manual set of XML code like the above in each spot where I need to fill something. If I created a query with say 30 tracks it would be enough to always auto fill all our albums, because they have less than 30 tracks... Except it would also leave a bunch of XML with empty fields if there were less than 30 tracks. So I'm wondering 2 things. Should I even use Xquery. It really seems like a ghetto way of going about this, especially because of having to manually create a CSV and import that album by album. (Addition, I did try to make one big CSV with the whole database, and then make it filter by using the UPC and the track number, problem is it returns the correct data I'm looking for... Once for every time a track 01 appears in the database! I can't find a way to make it only return it once, and it's maddening! If I could figure that out it might be workable in Xquery) If not, what should I use? We have Access, and MS SQL floating around. It seems like if I could get similar functionality as far as kicking out text out of either of those that alone would advance the cause considerably since they natively handle Excel files. Especially if I could make it automatically recognize when the UPC field changed in a single big long Excel spreadsheet but kick out separate text (XML) documents. As far as if Xquery is a valid way of going about this, can I embed the actual XML code portion in an "if then" sort of situation where it would NOT create data in the final query result if the track number didn't exist in the source XML created from the CSV? Or is there no way to include straight XML inside of code like that? ( Addition It seems like I should be able to use something like this return <video year="{$v/year}"> {$v/title} </video> within the code, which is something I found online. But it seems to always break when I try putting large amounts of XML in there. What are the rules for what can or can't go inside the code space?) If that is possible, I think even the ghetto Xquery way would be workable, otherwise there will be a lot of manual deletion in this big complex XML document that will have to be deleted out. That would still be easier than doing it completely manually, but certainly not optimal. As far as Access or SQL... I've never done anything much in there, but have worked with people who have done all kinds of awesome stuff with them. I know that SQL especially can create all of kinds of text being populated from spreadsheets, and honestly the code has to be so stupidly easy I know I'm going to want to shoot myself when I finally figure it out. It seems to me like there must be some kind of way to make one of them generate the XML tag I need, then pull from a particular field in a spreadsheet that's in the system, and only do it until the data ends or it reaches a new UPC in the spreadsheet. Perhaps create a separate table with all the XML start tags and end tags, so it's actually pulling everything from certain fields in spreadsheets to create the final product? I know this is a bit long winded, so sorry for that. I should have been a Russian novelist ;) And I'm sure this is pretty novice stuff, but I've found programmers are usually quite willing to help in the past. Since this must be pretty simple, I'm hoping I can figure this out and permanently expand my bag of tricks! All I really need is a way to auto pull fields based on perhaps a UPC/track number combo, and have it end when data stops. I could even set this up for all the other sites we deliver to and streamline things even more! Any thoughts or suggestions are welcome, and thank all of you in advance for your help!!!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...