Jump to content

Spam bots


Recommended Posts

Interesting. :) I'm not up to anything, honest. :)
I saw no accusations whatsoever.How was the saying... a guilty conscience is a self-accuser :) .Just imagine you can read the source code of any page, interpret it's parts (i.e. see which parts are links - the "<a href=" combo; determine what's an email adress - the name@domain.tld combo; etc.) and keep every finding in a database, making new requests on every URI found in search for... something (like emails maybe).That's basically how bots function. If you want to write up your own (if you ARE up to something), read up on PHP, regular expressions, maybe XML and DOM too.
Link to post
Share on other sites

Bots can be used for non-malicious purposes too :) - think search engines...

Link to post
Share on other sites
Depends what sort of information you want...
But you can block them (in .htaccess) confuse them and you can write traps for them. Bad bots most often do not respect the robots.txt file. Google: spider trapbad bot trapcrawler trapor variations thereoff.Some great relarted links:http://browsers.garykeith.com/downloads.asphttp://www.iplists.com/http://www.crawlwall.com/http://www.dnsstuff.com/ Good ? forum
Link to post
Share on other sites

Server-side languages have a way of getting the IP. In PHP, it is in $_SERVER['REMOTE_ADDR']. Any information about the browser and operating system can either be found in the user agent string ($_SERVER['HTTP_USER_AGENT']) or you can use the get_browser function to try and provide details based on the user agent string. Some browsers will include the OS as part of the user agent. The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country.

Link to post
Share on other sites
The only thing you can do to determine geographical information is to check which country their IP address is allocated to, but it's easy enough for someone to use a proxy that goes through another country.
Yes exactly, but by setting up a spider trap you can
  1. See if the Bot respects the robots.txt file
  2. See which part of your site it visits.
  3. Confuse it if it goes around robots.txt and block it if it continues on the second or third visit.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...