Jump to content

Extract Table Data From A Website


shobhitjain
 Share

Recommended Posts

Hey guys i got stuck at one place .I am using simple_html_dom (sourceforge)for extracting contents of the page from a websitebut i got stuck when i tried to extract data from the tablehttp://www.weatherbase.com/weather/weather...6200&refer=this is the link of that website.i want that data ,my code was<?php $original_file = file_get_contents("http://www.weatherbase.com/weather/weather.php3?s=606200&refer="); // $stripped_file = strip_tags($original_file);// preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches);//preg_match_all('/<td class="[^"]+">([^<]*)<\/td>/', $stripped_file, $matches);//preg_match_all('/<td>(.*)a</td>/', $stripped_file, $matches); //DEBUGGING //$matches[0] now contains the complete A tags; ex: <a href="link">text</a> //$matches[1] now contains only the HREFs in the A tags; ex: link //header("Content-type: text/plain"); //Set the content type to plain text so the print below is easy to read! //print_r($matches); //View the array to see if it workedif(preg_match("#<table ([^>]*)>(.*?)</table>#is", $original_file, $matches)){ #echo '<pre>' . htmlentities(print_r($matches, true)) . '</pre>'; // get our table attributes $attr = explode(" ", $matches[1]); // echo out our attributes echo '<pre>' . htmlentities(print_r($attr, true)) . '</pre>';}?>i tried all combinations(in comments)but i don't know where i am wrong,,please help

Link to comment
Share on other sites

The first regular expression finds links in the document, it will find every link and return the URL that it links to and the link text, plus any extra attributes. The second regular expression will return the contents of every td element that has a class. It will only match the td elements with a class. The last regular expression will match the content inside of any td that ends with the letter "a". So if the code has this:<td>fdsa</td>Since the content ends with the letter "a", the regular expression will match the "fds" and return that portion. From looking at the source code on that page, the td elements don't have a class on them, so I'm not sure why you're trying to use that one, I'm not sure why you're looking for the "a" character in the third pattern, and I'm not sure why you're looking for links at all, the tds don't contain any links.

Link to comment
Share on other sites

well thanksI have figured out the solution.Part 3 comes to be the value ,which i requiredpreg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $stripped_file, $matches, PREG_SET_ORDER); foreach ($matches as $val) { echo "matched: " . $val[0] . "\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[2] . "\n"; echo "part 3: " . $val[3] . "\n"; echo "part 4: " . $val[4] . "\n\n";}Regards

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...