confused and dazed Posted April 11, 2015 Share Posted April 11, 2015 Hello internet, There is a webpage I would like to start building but I need to be able to pull data from other webpages. The information is public info so it’s nothing sketchy but I don’t know where to start. I began with just doing google searches but it appears it is more involved than I originally thought. If you have any recommendations on tutorials or Forum discussions I would be grateful. Thanks. Link to comment Share on other sites More sharing options...
Ingolme Posted April 11, 2015 Share Posted April 11, 2015 You could start by searching for PHP cURL tutorials. cURL is used to make requests to remote servers. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 11, 2015 Author Share Posted April 11, 2015 Interestingly enough I started this video series on cURL its getting me started. Thanks. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 Yeah that tutorial kind of stalled out and went nowhere.... I looked and looked and I cant seem to get one that starts from the beginning. Anyone have any suggestions or links to tutorials? Link to comment Share on other sites More sharing options...
Ingolme Posted April 12, 2015 Share Posted April 12, 2015 I just use the PHP manual when I need to know how to use a library or function: http://php.net/curl Link to comment Share on other sites More sharing options...
Christopher.Burkhouse Posted April 12, 2015 Share Posted April 12, 2015 What kind of information are you pulling from the other pages? I'm not entirely sure how to help you out here, because depending on the data and how it's used, you may need very different codes. Are you including user content? Scripting code? Video/music embedding? The more detail the better, so I know how to direct you in the best way. -Chris Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 I would like to pull dozens of href links from a webpage as well as data within a bootstrap callout to export to a database. Basically there is a webpage that i am interested in that has many links to other pages as well as a lot of data i would like to export to a database. Link to comment Share on other sites More sharing options...
Ingolme Posted April 12, 2015 Share Posted April 12, 2015 Have you checked to see if the website provides an API to get the information? Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 You mean like putting a '.json' on the end of the webpage path? I did that and it did not launch a page with the data. Do you mean a different way? Link to comment Share on other sites More sharing options...
Ingolme Posted April 12, 2015 Share Posted April 12, 2015 No, putting .json wouldn't do anything. If they had an API they would say so in the documentation of their website. Have you learned to use cURL yet? Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 If they did how would i access the info using the API Link to comment Share on other sites More sharing options...
Ingolme Posted April 12, 2015 Share Posted April 12, 2015 You still need cURL to use the API, but the API would save you the trouble of trying to pull out data from the HTML code. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 I am familiar with API i used it when creating a plugin but i used .json to the html data. I dont know how to access the API any other way. How do i do it? Link to comment Share on other sites More sharing options...
Ingolme Posted April 12, 2015 Share Posted April 12, 2015 Some APIs return JSON, others return XML, others might have their own format. But with PHP, you still need cURL to get the data from the API. I suggest you familiarize yourself with the cURL library. If after reading the PHP manual you still have trouble, I could help with a couple of examples. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 12, 2015 Author Share Posted April 12, 2015 Will do. Thanks. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 13, 2015 Author Share Posted April 13, 2015 This is as far as I have been able to get. I know the code below works because it grabs the site and displays it on the page. When I try to get the info I want out of the source code i end up with nothing. I am trying to grab the href links but my arrays come up empty with no data. $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "some site"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); if(curl_exec($ch) === false) { echo 'Curl error: ' . curl_error($ch); } else { echo 'Operation completed without any errors'; } echo $output; curl_close($ch); Link to comment Share on other sites More sharing options...
Ingolme Posted April 13, 2015 Share Posted April 13, 2015 You should check the value of $output in the if() statement, rather than calling curl_exec() a second time. What value does $output have? Link to comment Share on other sites More sharing options...
confused and dazed Posted April 14, 2015 Author Share Posted April 14, 2015 OK so I decided to go the DOMDocument route and so far so good.I have been able to pull all the links from the site. Next is delimiting and saving to mysql More to come!! Link to comment Share on other sites More sharing options...
confused and dazed Posted April 20, 2015 Author Share Posted April 20, 2015 So - I have been successful in pulling all the very specific links that I want from a webpage using the code below - it works well. However I am struggling now to pull the text in between the over all <a></a> tags. Example: <a href=http...bla bla bla>THIS TEXT</a> How do I get the text "THIS TEXT" I wont be able to search for THIS TEXT because the text will not actually be "THIS TEXT" it will be different each time. Any thoughts? $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "some site"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $out = curl_exec($ch); if(curl_exec($ch) === false){echo 'Curl error: ' . curl_error($ch);} else {echo 'Operation completed without any errors'; echo "<br>";} curl_close($ch); $dom = new DOMDocument(); @$dom->loadHTML($out); foreach($dom->getElementsByTagName('a') as $links) { $try = $links->getAttribute('href'); if (preg_match('#^some very specific links#i', $try) === 1) { print_r($try); echo "<br>";} } Link to comment Share on other sites More sharing options...
Ingolme Posted April 20, 2015 Share Posted April 20, 2015 If you know where the links are going to be, then you can get the value of the first child, which would be a text node. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 20, 2015 Author Share Posted April 20, 2015 That worked. Thanks! Each time I progress I figure out I need more.... So here is my next quest - I need to get the text after the class="sort" but I need to be able to group the 22 with THIS TEXT1 and 0 with THIS TEXT2. Basically I am sending the data through mysql and THIS TEXT1 and 22 need to be in the same row in the database. <td align="left"><a href='some site' target='_blank'>THIS TEXT1</a></td><td align="center" class="sort">22</td> <td align="left"><a href='some site' target='_blank'>THIS TEXT2</a></td><td align="center" class="sort">0</td> Link to comment Share on other sites More sharing options...
Ingolme Posted April 20, 2015 Share Posted April 20, 2015 One way to get it would be to look through the <td> elements inside the current <tr> element and if the class attribute is "sort" (using getAttribute()) then get the value of the child node. Link to comment Share on other sites More sharing options...
confused and dazed Posted April 21, 2015 Author Share Posted April 21, 2015 Again - thanks for setting me off in the right direction. I was able to resolve that issue as well. SO... now I have a new problem. I was able to send the href links to my database where I was going to pull them individually into a curl session (with a while loop) when I realized you need a password and login to get to the page. I have both those things but I don't know how to code them in so the curl session can access the data from that page. Where do I go from here? Link to comment Share on other sites More sharing options...
confused and dazed Posted April 21, 2015 Author Share Posted April 21, 2015 (edited) I'm using this but it is not working... It's displaying the login page but its not logging in. What do I do? $username='usr1'; $password='pasw1'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $link); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY); curl_setopt($ch, CURLOPT_USERPWD, "$username:$password"); $out = curl_exec($ch); if(curl_exec($ch) === false) { echo 'Curl error: ' . curl_error($ch); } echo $out; curl_close($ch); Edited April 21, 2015 by confused and dazed Link to comment Share on other sites More sharing options...
justsomeguy Posted April 22, 2015 Share Posted April 22, 2015 You're trying to use HTTP basic authentication there. I doubt the server is using that. If you're trying to use cURL to log a user in then you need to do the same thing that the user would do with their browser, i.e. submit a post request that contains the data from the login form with the correct names, and get the cookies that the server sends back. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now