Jump to content

Simple HTML DOM Parser: how to get the nodeValue?


tinfanide

Recommended Posts

<?php// Simple HTML DOM Parserinclude('simple_html_dom.php');$html = file_get_html('http://lifelearning.xtreemhost.com/'); foreach ($html->getElementsByTagName("ul",0)->getElementsByTagName("li") as $li){echo $li->plaintext; // works; but return the whole nodeValues (From "Point 1" to "Reference")	    echo $li->childNodes(0)->nodeValue; // return nothingecho "<br />";}?>

<ul><li>Point 1	 <ul>		 <li>Point 1.1</li>		 <li>Point 1.2</li>		    <pre class="CodeDisplay">		    some codes		    </pre>		    <li>Reference: <a href="link.html" target="_blank">link</a></li>	    </ul>    </li> </ul>

I want to just thru PHP get the plain text of the first li ("Point 1").How can I do that in PHP?

Link to comment
Share on other sites

It's true that nodeValue and textContent concatenate the values of all text nodes in a node. But text nodes do exist in the object and can be accessed the old-fashioned way. You simply forgot the correct syntax. Try this

echo $li->childNodes->item(0)->textContent;

Of course. if you don't know the exact location of the text node, you can loop through the childNodes and test the nodeType or nodeName.

Edited by Deirdre's Dad
Link to comment
Share on other sites

  • 3 weeks later...
It's true that nodeValue and textContent concatenate the values of all text nodes in a node. But text nodes do exist in the object and can be accessed the old-fashioned way. You simply forgot the correct syntax. Try this
echo $li->childNodes->item(0)->textContent;

Of course. if you don't know the exact location of the text node, you can loop through the childNodes and test the nodeType or nodeName.

It returns the error:
Fatal error: Call to a member function item() on a non-object in /home/a6016920/public_html/example_basic_selector.php on line 11
Link to comment
Share on other sites

In PHP's DOMDocument there are no text nodes. The node value belongs to the <li> element itself.
echo $li->nodeValue

The PHP page returns nothing (a blank page).
Link to comment
Share on other sites

did you try var_dump() $li to see what is it actually?

Link to comment
Share on other sites

did you try var_dump() $li to see what is it actually?
It returns a large chunk of unreadable texts which I can't understand.
Link to comment
Share on other sites

<body><ul id="ul1"><li id="pt1">Point 1		 <ul id="ul2">			 <li id="pt11">Point 1.1</li>			 <li id="pt12">Point 1.2</li>			    <pre class="CodeDisplay">			    some codes			    </pre>			 <li id="ref">Reference: <a href="link.html" target="_blank">link</a></li>		 </ul>    </li></ul></body><script> alert(document.getElementsByTagName("li")[0].childNodes[0].nodeValue); // Point 1 </script>

In JS, it returns what I want.Is there an equivalent in the simple HTML Dom of PHP?

Link to comment
Share on other sites

The following code works. And it does not require an external library.

$url = 'http://www.someplace.com';$doc = new DOMDocument();$doc->loadHTMLFile($url);$els = $doc->getElementsByTagName('li');echo $els->item(0)->childNodes->item(0)->textContent;

Notice: if your document really looks like this, PHP will issue a warning. The <pre> element is not permitted between <li> elements.

<li id="pt12">Point 1.2</li>    <pre class="CodeDisplay">      some codes    </pre><li id="ref">Reference: <a href="link.html" target="_blank">link</a></li>

Link to comment
Share on other sites

The following code works. And it does not require an external library.
$url = 'http://www.someplace.com';$doc = new DOMDocument();$doc->loadHTMLFile($url);$els = $doc->getElementsByTagName('li');echo $els->item(0)->childNodes->item(0)->textContent;

Notice: if your document really looks like this, PHP will issue a warning. The <pre> element is not permitted between <li> elements.

<li id="pt12">Point 1.2</li>	<pre class="CodeDisplay">	  some codes	</pre><li id="ref">Reference: <a href="link.html" target="_blank">link</a></li>

Yes, ya're right. And you've made a point.First, the pre tag is not accepted by PHP. After removing the tags, the warning generated by PHP disappears.Second, the DOMDocument method achieves what I want. I don't need an external lib like the Simple HTML Dom. By the way, I can't check up online. I wonder if I can do this in PHP:
<?phpecho $html->getElementsByTagName('li',0)->childNodes[0]->textContent;// it generates warnings// instead of using item()?>

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...