Jump to content

Google.com


scott100

Recommended Posts

Okay, seems logical. Could there be a way to test or condition if it is a bot retrieving the page? eg. IP address or whatever, or request method?I am having dificulties conditioning the request method (like via bookmark, link, addressbar, bot, whatever, I like to identify all sorts)

Link to comment
Share on other sites

You can check the IP or the user agent. I believe Google publishes a list of IPs that the Googlebot will use. You can also search the user agent string for the word 'bot', there are also published lists online of which user agent strings different bots use.

Link to comment
Share on other sites

User agent string sounds good. You mean the server variable 'http_user_agent'? I could preg match it, could you give an example how to do that? I won't be able to test if it works, so I need help in doing it correct :)Then I will be able to only count site viewers that are human, jeej :)And finally my count won't be increased by somewhat Google.com account or whatever.

Link to comment
Share on other sites

You don't need to use regular expressions, it would be faster to use strpos and test for boolean false for each bot. Regular expressions would work better if all of the bot strings followed a pattern that browser strings don't, but that's not the case.Here are some lists of bot user agent strings:http://en.wikipedia.org/wiki/User_agent#Botshttp://www.pgts.com.au/pgtsj/pgtsj0208d.htmlYou probably don't need to test for all of them, just choose the 10 or 20 that you see the most in your logs or that are the largest sites. You would use strpos to test like this:if (strpos($_SERVER["HTTP_USER_AGENT"], "bot") !== false)This statement would be true if the string contains "bot", which would match a lot of different strings. You would probably just have a pretty big if statement with a condition for each string you want to check for. Make sure to use !== or === instead of != or ==.

Link to comment
Share on other sites

I can use that strpos function, would not be difficult, but that list :) Thats long, I don't have any kind of logs with wihich I can determine what agent the user used. that is why I wanted to shut down the Bot views increasement in the first place :) I have made all of my site my own you know, I am not yet advanced enough to have great logs, unfortunately. But lets not care about, that strpos would do it by itself right? Just to see if the user is likely a bot or not, and if true, (user is a bot) then don't increase visitor number. Hope it works :)

Link to comment
Share on other sites

Right, but you might want more conditions in the if statement to account for more bots (other than those that just have "bot" in the string). Your web server should have raw access logs though, regardless of if you have separate logs on your own stuff.

Link to comment
Share on other sites

I just recently got an IPB, and found some settings that would be of some interest.[removed]

Edited by RahXephon
Link to comment
Share on other sites

You don't need to use regular expressions, it would be faster to use strpos and test for boolean false for each bot.  Regular expressions would work better if all of the bot strings followed a pattern that browser strings don't, but that's not the case.Here are some lists of bot user agent strings:http://en.wikipedia.org/wiki/User_agent#Botshttp://www.pgts.com.au/pgtsj/pgtsj0208d.htmlYou probably don't need to test for all of them, just choose the 10 or 20 that you see the most in your logs or that are the largest sites.  You would use strpos to test like this:if (strpos($_SERVER["HTTP_USER_AGENT"], "bot") !== false)This statement would be true if the string contains "bot", which would match a lot of different strings.  You would probably just have a pretty big if statement with a condition for each string you want to check for.  Make sure to use !== or === instead of != or ==.

You could put an empty space to the end and the start of a string and the test. Then you could check whole word:$user_agent = $_SERVER["HTTP_USER_AGENT"];$user_agent = " " . $user_agent;$user_agent .= " ";if(strpos($user_agent, " bot ") !== false)
Link to comment
Share on other sites

Googlebot, yahoo bot, ask jeeves, and one other thing that i can't remember regularaly visit this forum. :)

Link to comment
Share on other sites

You could put an empty space to the end and the start of a string and the test. Then you could check whole word:
That would only find things where "bot" is a whole word. A user agent that has "bot" anywhere, like "googlebot", will not be found. I can't think of any browser that has the word "bot" in it, so it's probably better to search for just "bot", unless you want to write a different condition for each thing you want to block.
Link to comment
Share on other sites

Googlebot, yahoo bot, ask jeeves, and one other thing that i can't remember regularaly visit this forum. :)

webarchive.org I think...
Link to comment
Share on other sites

But where can I find these server logs? Are they commonly open to the public, and can I access them? Or should I ask for permission? :)
If you aren't sure, you'll probably need to ask your host if you can see the raw access logs for your domain.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...