Extract Table Data From A Website

shobhitjain · November 20, 2009

Hey guys i got stuck at one place .I am using simple_html_dom (sourceforge)for extracting contents of the page from a websitebut i got stuck when i tried to extract data from the tablehttp://www.weatherbase.com/weather/weather...6200&refer=this is the link of that website.i want that data ,my code was<?php $original_file = file_get_contents("http://www.weatherbase.com/weather/weather.php3?s=606200&refer="); // $stripped_file = strip_tags($original_file);// preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);//preg_match_all('/<td class="[^"]+">([^<]*)<\/td>/', $stripped_file, $matches);//preg_match_all('/<td>(.*)a</td>/', $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link //header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read! //print_r($matches); //View the array to see if it workedif(preg_match("#<table ([^>]*)>(.*?)</table>#is", $original_file, $matches)){ #echo '<pre>' . htmlentities(print_r($matches, true)) . '</pre>'; // get our table attributes $attr = explode(" ", $matches[1]); // echo out our attributes echo '<pre>' . htmlentities(print_r($attr, true)) . '</pre>';}?>i tried all combinations(in comments)but i don't know where i am wrong,,please help

justsomeguy · November 20, 2009

The first regular expression finds links in the document, it will find every link and return the URL that it links to and the link text, plus any extra attributes. The second regular expression will return the contents of every td element that has a class. It will only match the td elements with a class. The last regular expression will match the content inside of any td that ends with the letter "a". So if the code has this:<td>fdsa</td>Since the content ends with the letter "a", the regular expression will match the "fds" and return that portion. From looking at the source code on that page, the td elements don't have a class on them, so I'm not sure why you're trying to use that one, I'm not sure why you're looking for the "a" character in the third pattern, and I'm not sure why you're looking for links at all, the tds don't contain any links.

shobhitjain · November 21, 2009

sorry to say, but i am usually confused in regular expressions.<td>fdsa<td>the table didn't have any class defined so i was confused that on which basis i should extract data.i am still not able to extract fdsa from the whole table...i dont need <td><td> tags just fdsa

shobhitjain · November 21, 2009

well thanksI have figured out the solution.Part 3 comes to be the value ,which i requiredpreg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $stripped_file, $matches, PREG_SET_ORDER); foreach ($matches as $val) { echo "matched: " . $val[0] . "\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[2] . "\n"; echo "part 3: " . $val[3] . "\n"; echo "part 4: " . $val[4] . "\n\n";}Regards

Sign In

Extract Table Data From A Website

Recommended Posts

shobhitjain

Link to comment

Share on other sites

justsomeguy

Link to comment

Share on other sites

shobhitjain

Link to comment

Share on other sites

shobhitjain

Link to comment

Share on other sites

Archived

Browse

Activity