wbeck

Members

View Profile See their activity

Posts
2
Joined
July 20, 2012
Last visited
July 31, 2012

Profile Information

Location
Seattle, WA

wbeck's Achievements

Newbie (1/7)

Reputation

Figured Out Issue From Last Post, BUT Processing Time On Full Job Takes Forever

wbeck posted a topic in XML

Ok, so the long post didn't go over too well with anybody, fortunately I've figured out A solution on my own... Ish. If someone could give me a nudge in the right direction I'd be eternally grateful! If anyone wants to look at the structure of what I'm dealing with, it's in my previous post from a few days ago. So the following { let $doc := . for $v in $doc//root/Row, $t in $doc//root/Row/tracknumber/text() where matches($t, 'THIS IS THE UPC-01') and $v/tracknumber/text() = $t return $v/UPC/text() } Will do what I want. If I make a single field that combines the UPC and track number that will work as a unique identifier, and can be done in bulk in source spreadsheet. It will return the value if the field exists, and nothing if it doesn't. I even figured out a way to have it generate the XML tags around the value in the eventuality that the track exists via { let $doc := . for $v in $doc//root/Row, $t in $doc//root/Row/tracknumber/text() where matches($t, 'THIS IS THE UPC-01') and $v/tracknumber/text() = $t return'</ISRC> </SoundRecordingId> <ResourceReference>A29</ResourceReference> <ReferenceTitle> <TitleText>' } I can't get it to leave the < and > in doing it that way, but a simple find/replace fixes that right up. If there's a way to keep the brackets it would be cool to know though In any event, I've basically solved all my issues for generating a complete valid XML doc with only fields that need to exist. Except when I run it on my full Spreadsheet converted to XML it chokes. Works great on a test source doc with 3 albums, does it in 1.6 seconds. However when I have the full 3000+ song XML doc I never let it run long enough to finish, as overnight seemed like more than enough time to generate a single albums worth of XML. I've noticed that processing time seems to go up exponentially as the number of tracks goes up At ~400 it's 188 seconds, at ~800 it's 677 seconds, etc. It would seem to me this SHOULD be a linear increase in processing time. Like 3000 tracks being processed SHOULD be about 145 seconds judging from my test setup. I'm sure it's some nuance of the code and how Xquery processes things. So if someone could show me a way to set this up where it will be able to run through things a bit faster that would be amazing. Otherwise I'm going to need to create several shorted source documents to run the query on, which will be a bit of a nusance... And I'd just like to figure out why I'm having the problem for future reference. Thank you in advance for your help!!!
- July 26, 2012
Best Way To Generate Complex XML Doc From Spreadsheet - Help!

wbeck posted a topic in XML

Hello! So, first off I'm not really a "real" programmer. I handle a lot of tech stuff at the company I'm with though, and I'm pretty good at dabbling. I usually figure out how to do what I need to do when I need to do it, and I rarely can't figure things out! I know plenty of HTML, how to work with CSS style sheets, all kinds of fancy Excel formulas, a bit of XML etc. (Note there are a few tweaks to this post that might not seem to "flow well" with the rest as I've posted it elsewhere first and didn't get much back, but I have figured our a few more things) Sooo, the thing is, I now kind of need to figure out something new, and it's a little more in depth than other things I've tried to do in the past and I'm not sure the best way to go about it. So here's the score. I work for a record label, and we provide metadata for album releases to numerous sites in XML. We've been doing this for years, and the way most XML files display in Grid mode in Stylus has made it so easy to just copy and paste from Excel I've never seen the need to figure out a "proper" or more automated system. In the long haul I probably should have because it would have saved time in the long run, but we really custom tailor our XML to each sites system to optimize it, which requires a human touch anyway. Enter a new site we're delivering directly to. Their XML is very complicated and lethargic. There's a lot of duplicated data in different tags that shouldn't really exist, but are required. It's to the point of ridiculousness really. It'd be waaay too much work to manually enter things. So I started looking for a solution to automate it. I have kind of found a way. Basically I used Stylus to convert a CSV for an album to XML that looks like this <root> <fieldnames> <UPC>UPC</UPC> <catalog>Catalog Number</catalog> <tracknumber>Track Number</tracknumber> <release>Release Date</release> <albumtitle>Title</albumtitle> <artist>Artist</artist> <tracktitle>Track Title</tracktitle> <isrc>ISRC USM2U</isrc> <description>Description</description> <time1>Track Times</time1> <field>Source Master</field> <field>Notes</field> <field>Song Writers</field> </fieldnames> <Row> <UPC>XXXXXXXXX</UPC> <catalog> CD119</catalog> <tracknumber>01</tracknumber> <release>4/10/2003</release> <albumtitle>XXXXXXXXXX</albumtitle> <artist>YYYYYYYYYYYYY</artist> <tracktitle>ZZZZZZZZZZZZ</tracktitle> <isrc>XXXXXXXXXX</isrc> <time1>2:49</time1> <time2>0:02:49</time2> <time3>169</time3> <field>169000</field> </Row> Then that repeats for all tracks Then I found Xquery code { let $doc := . for $v in $doc//root/Row, $a in $doc//root/Row/tracknumber/text() where matches($a, '01') and $v/tracknumber/text() = $a return $v/UPC/text() } Which can pull data from whatever field based off of the track number. This would allow it to pull the track name, track time, etc on a per track basis. So in essence I could setup <File> <FileName>{ let $doc := . for $v in $doc//root/Row, $a in $doc//root/Row/tracknumber/text() where matches($a, '2') and $v/tracknumber/text() = $a return $v/UPC/text() }_01_01_FLAC.flac</FileName> <FilePath>resources</FilePath> <HashSum> <HashSum>CHECKSUM SONG 01</HashSum> <HashSumAlgorithmType>MD5</HashSumAlgorithmType> </HashSum> </File> Etc for each field that would pull the appropriate data in a query that features the actual XML in there as well, so the end result of a full query would be a valid XML document. The downside is that I have to have a manual set of XML code like the above in each spot where I need to fill something. If I created a query with say 30 tracks it would be enough to always auto fill all our albums, because they have less than 30 tracks... Except it would also leave a bunch of XML with empty fields if there were less than 30 tracks. So I'm wondering 2 things. Should I even use Xquery. It really seems like a ghetto way of going about this, especially because of having to manually create a CSV and import that album by album. (Addition, I did try to make one big CSV with the whole database, and then make it filter by using the UPC and the track number, problem is it returns the correct data I'm looking for... Once for every time a track 01 appears in the database! I can't find a way to make it only return it once, and it's maddening! If I could figure that out it might be workable in Xquery) If not, what should I use? We have Access, and MS SQL floating around. It seems like if I could get similar functionality as far as kicking out text out of either of those that alone would advance the cause considerably since they natively handle Excel files. Especially if I could make it automatically recognize when the UPC field changed in a single big long Excel spreadsheet but kick out separate text (XML) documents. As far as if Xquery is a valid way of going about this, can I embed the actual XML code portion in an "if then" sort of situation where it would NOT create data in the final query result if the track number didn't exist in the source XML created from the CSV? Or is there no way to include straight XML inside of code like that? ( Addition It seems like I should be able to use something like this return <video year="{$v/year}"> {$v/title} </video> within the code, which is something I found online. But it seems to always break when I try putting large amounts of XML in there. What are the rules for what can or can't go inside the code space?) If that is possible, I think even the ghetto Xquery way would be workable, otherwise there will be a lot of manual deletion in this big complex XML document that will have to be deleted out. That would still be easier than doing it completely manually, but certainly not optimal. As far as Access or SQL... I've never done anything much in there, but have worked with people who have done all kinds of awesome stuff with them. I know that SQL especially can create all of kinds of text being populated from spreadsheets, and honestly the code has to be so stupidly easy I know I'm going to want to shoot myself when I finally figure it out. It seems to me like there must be some kind of way to make one of them generate the XML tag I need, then pull from a particular field in a spreadsheet that's in the system, and only do it until the data ends or it reaches a new UPC in the spreadsheet. Perhaps create a separate table with all the XML start tags and end tags, so it's actually pulling everything from certain fields in spreadsheets to create the final product? I know this is a bit long winded, so sorry for that. I should have been a Russian novelist And I'm sure this is pretty novice stuff, but I've found programmers are usually quite willing to help in the past. Since this must be pretty simple, I'm hoping I can figure this out and permanently expand my bag of tricks! All I really need is a way to auto pull fields based on perhaps a UPC/track number combo, and have it end when data stops. I could even set this up for all the other sites we deliver to and streamline things even more! Any thoughts or suggestions are welcome, and thank all of you in advance for your help!!!
- July 23, 2012
- - XML
  - SQL
  - (and 2 more)
    Tagged with:
    
    XML
    
    SQL
    
    Xquery
    
    spreadsheet conversion

Sign In

wbeck

Posts

Joined

Last visited

Profile Information

wbeck's Achievements

Newbie (1/7)

Reputation

Figured Out Issue From Last Post, BUT Processing Time On Full Job Takes Forever

Best Way To Generate Complex XML Doc From Spreadsheet - Help!

Browse

Activity