Jump to content

Spam bots


Html

Recommended Posts

It can get whatever links are on the page using regular expressions and send out a request for those pages the same way it got to the first page.
Interesting. :) I'm not up to anything, honest. :)
Link to comment
Share on other sites

Interesting. :) I'm not up to anything, honest. :)
I saw no accusations whatsoever.How was the saying... a guilty conscience is a self-accuser :) .Just imagine you can read the source code of any page, interpret it's parts (i.e. see which parts are links - the "<a href=" combo; determine what's an email adress - the name@domain.tld combo; etc.) and keep every finding in a database, making new requests on every URI found in search for... something (like emails maybe).That's basically how bots function. If you want to write up your own (if you ARE up to something), read up on PHP, regular expressions, maybe XML and DOM too.
Link to comment
Share on other sites

Depends what sort of information you want...
But you can block them (in .htaccess) confuse them and you can write traps for them. Bad bots most often do not respect the robots.txt file. Google: spider trapbad bot trapcrawler trapor variations thereoff.Some great relarted links:http://browsers.garykeith.com/downloads.asphttp://www.iplists.com/http://www.crawlwall.com/http://www.dnsstuff.com/ Good ? forum
Link to comment
Share on other sites

Server-side languages have a way of getting the IP. In PHP, it is in $_SERVER['REMOTE_ADDR']. Any information about the browser and operating system can either be found in the user agent string ($_SERVER['HTTP_USER_AGENT']) or you can use the get_browser function to try and provide details based on the user agent string. Some browsers will include the OS as part of the user agent. The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country.

Link to comment
Share on other sites

The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country.
Yes exactly, but by setting up a spider trap you can
  1. See if the Bot respects the robots.txt file
  2. See which part of your site it visits.
  3. Confuse it if it goes around robots.txt and block it if it continues on the second or third visit.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...