carefree Posted March 8, 2009 Share Posted March 8, 2009 Hi, This is just a hobby for me.....I want to build my own search engine. I have submissions and urls for sites, im just missing the crawlerHow can i go about building my own spider?Ill just need to scan the index page of each site and insert the meta tags/keywords/title etc into my databaseI have good sql/php/html knowledge, so if someone points me in the right direction i will have no probsIm assuming spiders run by curl...if so i wont have to much trouble Link to comment Share on other sites More sharing options...
justsomeguy Posted March 9, 2009 Share Posted March 9, 2009 Curl is one way to do it, other than getting pages you need to use some clever regular expressions to find the links to other pages. There's also the robots.txt file that you should follow if you're going to build a crawler. Most search engines now don't look at things like meta tags, they look at the actual body content and get the keywords from that. Link to comment Share on other sites More sharing options...
carefree Posted March 10, 2009 Author Share Posted March 10, 2009 Truthfully it will take me a month of building and testing the spider.I need to find an open source spider engine to build offI also got a few quotes from programmers to knock one up for me but the average quote is $1,200 to $4,000 bucks.....not worth it for a hobby Link to comment Share on other sites More sharing options...
jesh Posted March 10, 2009 Share Posted March 10, 2009 This article, written for C#, went a long way in helping me understand the basics of crawlers:http://www.codeproject.com/KB/IP/Spideroo.aspx Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.