Make My Own Site Crawler/spider

carefree · March 8, 2009

Hi, This is just a hobby for me.....I want to build my own search engine. I have submissions and urls for sites, im just missing the crawlerHow can i go about building my own spider?Ill just need to scan the index page of each site and insert the meta tags/keywords/title etc into my databaseI have good sql/php/html knowledge, so if someone points me in the right direction i will have no probsIm assuming spiders run by curl...if so i wont have to much trouble

justsomeguy · March 9, 2009

Curl is one way to do it, other than getting pages you need to use some clever regular expressions to find the links to other pages. There's also the robots.txt file that you should follow if you're going to build a crawler. Most search engines now don't look at things like meta tags, they look at the actual body content and get the keywords from that.

carefree · March 10, 2009

Truthfully it will take me a month of building and testing the spider.I need to find an open source spider engine to build offI also got a few quotes from programmers to knock one up for me but the average quote is $1,200 to $4,000 bucks.....not worth it for a hobby

jesh · March 10, 2009

This article, written for C#, went a long way in helping me understand the basics of crawlers:http://www.codeproject.com/KB/IP/Spideroo.aspx

Sign In

Make My Own Site Crawler/spider

Recommended Posts

carefree

Link to comment

Share on other sites

justsomeguy

Link to comment

Share on other sites

carefree

Link to comment

Share on other sites

jesh

Link to comment

Share on other sites

Archived

Browse

Activity