Jump to content

Reading raw XML


Gokuu

Recommended Posts

First of all, let me apologize if this has already been posted, but I can't perform a search on "XML" :) Also, sorry for the long post :)Now, to the actual problem I'm having...Many of you must already know of the World of Warcraft's Armory (http://armory.wow-europe.com).The data is made of a XML transformed with a XSL, and I'd like to get access to the raw XML, but have been unable to do so up to now. What boggles me the most is, if I try viewing the source in either Firefox or IE, or even open the file in Notepad directly from the Web, I get the raw XML I need, which makes me believe that even PHP, the XSL is applied on the "client" side.I've tried using the DOM, filesystem functions, with the same results.I even tried running a packet sniffer to sniff the packets sent to the server by Notepad, and sent the same packets to a socket I had opened. But that didn't work either.The closest I got was when I tried using the wget with a different user-agent (a possible solution I saw somewhere else). But I get an incomplete result:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/layout/character-skills.xsl"?><page globalSearch="0" lang="en_gb" requestUrl="/character-skills.xml">  <characterInfo>    <character battleGroup="Reckoning" charUrl="r=Bronzebeard&n=Zethus" class="Hunter" classId="3" faction="Alliance" factionId="0" gender="Male" genderId="0" guildName="The Conclave" guildUrl="r=Bronzebeard&n=The+Conclave&p=1" lastModified="19 April 2007" level="49" name="Zethus" race="Night Elf" raceId="4" realm="Bronzebeard" title=""/>    <skillTab>      <skillCategory key="professions" name="Professions">        <skill key="leatherworking" max="300" name="Leatherworking" value="257"/>        <skill key="skinning" max="375" name="Skinning" value="300"/>      </skillCategory>      <skillCategory key="secondaryskills" name="Secondary Skills">        <skill key="cooking" max="300" name="Cooking" value="265"/>        <skill key="firstaid" max="300" name="First Aid" value="260"/>        <skill key="fishing" max="225" name="Fishing" value="170"/>        <skill key="riding" max="75" name="Riding" value="75"/>      </skillCategory>      <skillCategory key="weaponskills" name="Weapon Skills">        <skill key="axes" max="245" name="Axes" value="127"/>        <skill key="bows" max="245" name="Bows" value="194"/>        <skill key="crossbows" max="245" name="Crossbows" value="1"/>        <skill key="daggers" max="245" name="Daggers" value="94"/>        <skill key="defense" max="245" name="Defense" value="209"/>        <skill key="guns" max="245" name="Guns" value="243"/>        <skill key="polearms" max="245" name="Polearms" value="213"/>        <skill key="staves" max="245" name="Staves" value="1"/>        <skill key="swords" max="245" name="Swords" value="123"/>        <skill key="thrown" max="245" name="Thrown" value="1"/>        <skill key="two-handedswords" max="245" name="Two-Handed Swords" value="1"/>        <skill key="unarmed" max="245" name="Unarmed" value="40"/>      </skillCategory>      <skillCategory key="classskills" name="Class Skills">        <skill key="beastmastery" max="0" name="Beast Mastery" value="245"/>        <skill key="marksmanship" max="0" name="Marksmanship" value="245"/>        <skill key="survival" max="0" name="Survival" value="245"/>      </skillCategory>      <skillCategory key="armorproficiencies" name="Armor Proficiencies">        <skill key="cloth" max="0" name="Cloth" value="1"/>        <skill key="leather" max="0" name="Leather" value="1"/>        <skill key="mail" max="0" name="Mail" value="1"/>      </skillCategory>      <skillCategory key="languages" name="Languages">        <skill key="language:common" max="300" name="Language: Common" value="300"/>        <skill key="language:darnassian" max="300" name="Language: Darnassian" value="300"/>      </skillCategory>    </skillTab>  </characterInfo></page>

I ran wget with the following command:

wget -O guild.xml --user-agent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)' http://armory.wow-europe.com/character-skills.xml?r=Bronzebeard&n=Zethus

PS: Just for the sake of it, I'm putting the code I used with fsockopen, without success (code changed to increase readability:

$fp = fsockopen ("armory.wow-europe.com", 80, &$errnr, &$errstr) or die("$errno: $errstr");		$get = <<<EOFGET /character-skills.xml?r=Bronzebeard&n=Zethus HTTP/1.1Accept: */*UA-CPU: x86Accept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)Host: armory.wow-europe.comConnection: Keep-AliveCache-Control: no-cacheEOF;	fputs($fp, $get);	while (!$end)	{		$line = fgets($fp, 2048);		if (trim($line) == "")				$end = true;		else				echo $line;	}

Does anyone have any idea on how to do this? Or should I abandon my idea altogether?

Link to comment
Share on other sites

Hi!Try to change this:

while (!$end)	{		$line = fgets($fp, 2048);		if (trim($line) == "")				$end = true;		else				echo $line;	}{/code]to[code]	while (!feof($fp))	{		$line = fgets($fp, 2048);		echo $line;	}

Your code will stop ath the first empty line(s)/return, this isn't good as it can be empty lines in XML.It's better to check for eof instead..You could also use file_get_contents() to get get file:

$text = file_get_contents( "armory.wow-europe.com" );

It could, but shouldn't make a difference if it's the default UserAgent or Mozilla...Other than that I can't see why you would get just a part of the xml-file..Note, visited the site, and it wasn't until some clicks and a url like this: http://armory.wow-europe.com/#search.xml?s...Type=characters that you got a character-list and that one you can't thru sources (as it uses ajax) but if you know which file that generates the list you should be able to go to that one to get the xml...Good Luck and Don't Panic!

Link to comment
Share on other sites

Thanks, with that small change I discovered that I was sending a Bad Request (that's what you get for Copy & Pasteing code from the web without actually thinking about it :)), so I investigated a little more and was now able to send a correct header to the socket, but I still get the XML after being transformed.The current header I'm sending is as follows:

GET /character-skills.xml?r=Bronzebeard&n=Zethus HTTP/1.1Host: armory.wow-europe.comContent-Type: text/plainContent-Length: 22Connection: closer=Bronzebeard&n=Zethus

each of the lines ending with \r\n.Do you know of any way I can change the header I send to make it reply with raw XML?Oh, and by the way, if you access the page with http://armory.wow-europe.com/character-ski...rd&n=Zethus it will take you directly to the page I'm testing with

Link to comment
Share on other sites

Add the user agent to the header again:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Discovered while using wget with default UA that I got it as HTML, but when I used that UA-string in the headers I got it has XML.

Link to comment
Share on other sites

HeheCame here to say I discovered the solution by adding

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

to the header, and I saw you had already posted it :)One strange thing I noticed... This user agent doesn't parse the XML, but a Lynx browser user agent does... Well, it's solved. Thanks for your help Mr. CHISOL :)

Link to comment
Share on other sites

Hehe...One strange thing I noticed... This user agent doesn't parse the XML, but a Lynx browser user agent does... Well, it's solved. Thanks for your help Mr. CHISOL :)
:?)The reason for that is prob. that Lynx doesn't have support for XSL, where IE, FF etc does. Then the server generates the source so it can work with lynx too..
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...