Jump to content

NEW learning area


businessman332211@hotmail.com

Recommended Posts

I have tried a few ideas. I tested some stuff, I know get file contents, if you append it to a variable, and echo it, the whole page get's displayed. THe next thing I need to do, is set up a process, one page at a time. I want right now to do this.So I haveCode:

<?php$file = file_get_contents("http://whitepages.addresses.com/zip_codes_by_state/AL/A.html");?>

What I want to do next I need a little guidance to point me in the right direction. From here I am going to want to get php, to follow the url's on that page. When it goes to one of the cities, I want it to pullZip, Code, City, County, Area Code, and Time Zone from those pages. Then I want it to record that, then I want to get the contents of the same state but page b.html, this will get all the info for b's then c-z, after I get all of those, I have all the contents recorded. THen I use flush, to give the server a second to rest, after that I want to get contents of the next state, and repeat the process(the whole time saving the search result's in 1 giant array. At the end of the entire process I want to dynamically echo out statements to allow me to copy and paste it.I am going to dynamically build my insert sql queries with the captured data, allowing me to do one huge copy and paste, to allow me to have all the data there, so I can just put it all in the database at once, I know this is possible, and honestly I know a large portion of how I am going to develop this system. What I need to know is how to get php to follow a url, I know get file contents can get the page, but then I need php to follow the url's so I can do what I want, then go back and follow another url, and so forth??

Edited by businessman332211@hotmail.com
Link to comment
Share on other sites

I don't know if it will help you but if oyu go to my website -> Downloads and download my spider you can view the source code (DotNetCrawler.cs) and see how I did it. Except in mine I used regexp to get all the <a> href's from each page then add them to the list to be followed.

Link to comment
Share on other sites

I understand that, but when I use file_get_contents then it doesn't work out. I mean it displays the data, but how do I get it to follow a link, if I knew how to get PHP to pick up a link, and after I figure out how to do that, then what I can do is put it in an array(the results), I know how to do all of that. THe thing I need to figure out is1. How to record link information so I can put them into an array2. How to follow the links I have in the array, That is what I can't figure out, I could look at that, but I am real busy I don't know if I have time to sign up right now, but the other thing is, it's asp.net I don't think that'll help much since I am basing this in PHP.I just looked at them, but it didn't help, it's in asp.net I was surprised(I guess because I know php), I understood most of it, but it didn't make since in relation to what I was trying to do with php.

Link to comment
Share on other sites

you use regular expressions to return <a> tags, then use the preg_match_all function, I think.after that, you store them in an array or file or something, and then use file_get_contents to all the locations in the array.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...