Html Posted October 17, 2007 Share Posted October 17, 2007 Hi there.Can they travel on to a new page, via a link on the previous page? Link to comment Share on other sites More sharing options...
justsomeguy Posted October 17, 2007 Share Posted October 17, 2007 They can find links on pages. A bot is as smart as the programmer wanted to (or could) make it. Link to comment Share on other sites More sharing options...
Html Posted October 17, 2007 Author Share Posted October 17, 2007 Right, and the bot can travel on to the page it has found, from reading the link? Link to comment Share on other sites More sharing options...
justsomeguy Posted October 17, 2007 Share Posted October 17, 2007 It can get whatever links are on the page using regular expressions and send out a request for those pages the same way it got to the first page. Link to comment Share on other sites More sharing options...
Html Posted October 20, 2007 Author Share Posted October 20, 2007 It can get whatever links are on the page using regular expressions and send out a request for those pages the same way it got to the first page.Interesting. I'm not up to anything, honest. Link to comment Share on other sites More sharing options...
boen_robot Posted October 20, 2007 Share Posted October 20, 2007 Interesting. I'm not up to anything, honest. I saw no accusations whatsoever.How was the saying... a guilty conscience is a self-accuser .Just imagine you can read the source code of any page, interpret it's parts (i.e. see which parts are links - the "<a href=" combo; determine what's an email adress - the name@domain.tld combo; etc.) and keep every finding in a database, making new requests on every URI found in search for... something (like emails maybe).That's basically how bots function. If you want to write up your own (if you ARE up to something), read up on PHP, regular expressions, maybe XML and DOM too. Link to comment Share on other sites More sharing options...
Synook Posted October 21, 2007 Share Posted October 21, 2007 Bots can be used for non-malicious purposes too - think search engines... Link to comment Share on other sites More sharing options...
Html Posted October 21, 2007 Author Share Posted October 21, 2007 What about if you wanted to find some information from your visitors? You'd use a cookie, made using php?But then, if a cookie doesn't work, can one of these bots do this? Link to comment Share on other sites More sharing options...
Synook Posted October 22, 2007 Share Posted October 22, 2007 Depends what sort of information you want... Link to comment Share on other sites More sharing options...
kgun Posted October 22, 2007 Share Posted October 22, 2007 Depends what sort of information you want...But you can block them (in .htaccess) confuse them and you can write traps for them. Bad bots most often do not respect the robots.txt file. Google: spider trapbad bot trapcrawler trapor variations thereoff.Some great relarted links:http://browsers.garykeith.com/downloads.asphttp://www.iplists.com/http://www.crawlwall.com/http://www.dnsstuff.com/ Good ? forum Link to comment Share on other sites More sharing options...
Html Posted October 22, 2007 Author Share Posted October 22, 2007 Depends what sort of information you want...May be their IP, browser, OS, and country of origin? Link to comment Share on other sites More sharing options...
justsomeguy Posted October 22, 2007 Share Posted October 22, 2007 Server-side languages have a way of getting the IP. In PHP, it is in $_SERVER['REMOTE_ADDR']. Any information about the browser and operating system can either be found in the user agent string ($_SERVER['HTTP_USER_AGENT']) or you can use the get_browser function to try and provide details based on the user agent string. Some browsers will include the OS as part of the user agent. The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country. Link to comment Share on other sites More sharing options...
kgun Posted October 22, 2007 Share Posted October 22, 2007 The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country.Yes exactly, but by setting up a spider trap you can See if the Bot respects the robots.txt file See which part of your site it visits. Confuse it if it goes around robots.txt and block it if it continues on the second or third visit. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.