Jump to content

Extract Table Data From A Website


shobhitjain

Recommended Posts

Hey guys i got stuck at one place .I am using simple_html_dom (sourceforge)for extracting contents of the page from a websitebut i got stuck when i tried to extract data from the tablehttp://www.weatherbase.com/weather/weather...6200&refer=this is the link of that website.i want that data ,my code was<?php $original_file = file_get_contents("http://www.weatherbase.com/weather/weather.php3?s=606200&refer="); // $stripped_file = strip_tags($original_file);// preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);//preg_match_all('/<td class="[^"]+">([^<]*)<\/td>/', $stripped_file, $matches);//preg_match_all('/<td>(.*)a</td>/', $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link //header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read! //print_r($matches); //View the array to see if it workedif(preg_match("#<table ([^>]*)>(.*?)</table>#is", $original_file, $matches)){ #echo '<pre>' . htmlentities(print_r($matches, true)) . '</pre>'; // get our table attributes $attr = explode(" ", $matches[1]); // echo out our attributes echo '<pre>' . htmlentities(print_r($attr, true)) . '</pre>';}?>i tried all combinations(in comments)but i don't know where i am wrong,,please help

Link to comment
Share on other sites

The first regular expression finds links in the document, it will find every link and return the URL that it links to and the link text, plus any extra attributes. The second regular expression will return the contents of every td element that has a class. It will only match the td elements with a class. The last regular expression will match the content inside of any td that ends with the letter "a". So if the code has this:<td>fdsa</td>Since the content ends with the letter "a", the regular expression will match the "fds" and return that portion. From looking at the source code on that page, the td elements don't have a class on them, so I'm not sure why you're trying to use that one, I'm not sure why you're looking for the "a" character in the third pattern, and I'm not sure why you're looking for links at all, the tds don't contain any links.

Link to comment
Share on other sites

sorry to say, but i am usually confused in regular expressions.<td>fdsa<td>the table didn't have any class defined so i was confused that on which basis i should extract data.i am still not able to extract fdsa from the whole table...i dont need <td><td> tags just fdsa

Link to comment
Share on other sites

well thanksI have figured out the solution.Part 3 comes to be the value ,which i requiredpreg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $stripped_file, $matches, PREG_SET_ORDER); foreach ($matches as $val) { echo "matched: " . $val[0] . "\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[2] . "\n"; echo "part 3: " . $val[3] . "\n"; echo "part 4: " . $val[4] . "\n\n";}Regards

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...