Jump to content

DOM


ApocalypeX

Recommended Posts

So I have created a new DOMDocument so I can gather some data from a page:$html= file_get_contents($url);$dom = new DOMDocument();@$dom->loadHTML($html);But when I do something like:$dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->parentNode;$dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->nextSibling->nodeValue;$dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->parentNode->nextSibling->nodeValue;Or anything involving moving around the DOM I get this error:Notice: Trying to get property of non-objectThe getElementById bit works. I'm a Javascript guy so I guess I know what I'm doing just I don't know how to get about to it with PHP. Could I have some help on how to write/do parentNode, nextSibling etc.Thank you.

Link to comment
Share on other sites

I don't like the @ at loadHTML(). Are you sure the file loaded properly? Try to replace

@$dom->loadHTML($html);

with

libxml_use_internal_errors(true);if (!$dom->loadHTML($html)) {echo libxml_get_last_error()->message;}

Link to comment
Share on other sites

Starting with your code, this works just fine for me:

echo $dom->getElementById("testspan")->parentNode->getAttribute("id");

So chaining works fine. I'm not sure why you get the error you get.Careful with siblings. They could easily be text nodes (whitespace). But they should still have a value. Any node should return some value without returning an error, even if it's just an empty string.How do you know the getElementById works?

Link to comment
Share on other sites

Starting with your code, this works just fine for me:
<?php$username = "apocalypex";$url = "http://www.bungie.net/Community/PeopleFinder.aspx?page=0&search_mode=0&search_criteria={$username}";$html= file_get_contents($url);$dom = new DOMDocument();libxml_use_internal_errors(true);if (!$dom->loadHTML($html)) {echo libxml_get_last_error()->message;}if($dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink") == null){	echo "User does not exist!";	exit;}else{	$UID = str_replace('/Account/profile.aspx?uid=','',$dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->getAttribute("href"));	echo $UID . '<BR>';		$USERNAME = $dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->nodeValue;	echo $USERNAME . '<BR>';		$url = "http://www.bungie.net/Account/profile.aspx?uid={$UID}";	$html= file_get_contents($url);	$dom = new DOMDocument();	@$dom->loadHTML($html);		if($dom->getElementById("ctl00_mainContent_header_gamertagLinkFloat") == null)		$GAMERTAG = "NULL";	else		$GAMERTAG = $dom->getElementById("ctl00_mainContent_header_gamertagLinkFloat")->nodeValue;	echo $GAMERTAG . '<BR>';			$MEMBERSINCE = str_replace('Member Since: ','',$dom->getElementsByTagName("li")->item(123)->nodeValue);	echo $MEMBERSINCE . '<BR>';			echo $obj = $dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->parentNode->getAttribute("class");}?>

And the main bit of page source code from the page its loading:

  <ul>                            <li class="larger"><span id="ctl00_mainContent_header_lblUsername">ApocalypeX</span>                            <span id="ctl00_mainContent_header_gtFloatLabel" style="font-size: 15px; padding-left: 5px; font-weight: normal;">XBL: <a id="ctl00_mainContent_header_gamertagLinkFloat" href="/stats/halo3/default.aspx?player=ApocalypeX">ApocalypeX</a></span></li>                            <li>Member</li>                            <li>Last Active: <span id="ctl00_mainContent_header_lblLastActive">05.14.2010</span></li>                            <div id="ctl00_mainContent_header_msgNotDisabledPanel">	                                                            </div>                        </ul>

Link to comment
Share on other sites

When I run your code, I get this output:
4559494<BR>ApocalypeX<BR>ApocalypeX<BR>Member Since:													04.15.2008<BR>

Is that incorrect?

Yep. I want to get the nodeValue of the <li> tag thats innerHTML is Member. So I thought a parentNode & nextSibling would work but it doesnt.The line:echo $obj = $dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->parentNode->getAttribute("class");I made it try display the Class of that element as a test. But it should really read like this:echo $obj = $dom->getElementById("ctl00_MainContentArea_results_repeater_ctl00_userLink")->parentNode->nextSibling->nodeValue;
Link to comment
Share on other sites

One of us is confused. The first document you fetch is here:http://www.bungie.net/Community/PeopleFinder.aspx?page=0&search_mode=0&search_criteria=apocalypexYou scrape that document to get the uid that takes you here:http://www.bungie.net/Account/profile.aspx?uid=4559494On this second page, you find the element with id"ctl00_mainContent_header_gamertagLinkFloat"You also look for the element with id"ctl00_MainContentArea_results_repeater_ctl00_userLink"I do not find that element on the second page. I find it on the first page. I think that's what I find.

Link to comment
Share on other sites

One of us is confused. The first document you fetch is here:http://www.bungie.net/Community/PeopleFinder.aspx?page=0&search_mode=0&search_criteria=apocalypexYou use that to get the uid that takes you here:http://www.bungie.net/Account/profile.aspx?uid=4559494On this second page, we find the element with id"ctl00_mainContent_header_gamertagLinkFloat"You also look for the element with id"ctl00_MainContentArea_results_repeater_ctl00_userLink"I do not find that element on the second page. I find it on the first page.
Omg... This is embarrassing. Its the wrong ID. Wow sorry guys I've been trying to fix this for 2 days and I never noticed the ID. Sorry for wasting your time guys.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...