Jump to content

ROBOTS


user4fun

Recommended Posts

How do i start with the robot.txt thing.I need tutorials.I need all the web help i can get, so do i need to just make a file called robot.txt and leave it alone?I dont care what robot crawls what, my important directories are password protected, everyting is up for any one that wants to look?How do i now that robots are crawling? or what ever it is they call it

Link to comment
Share on other sites

You shouldn't worry if the folder(s)/file(s) are password protected, as the search engine (be it GoogleBot or anyone else) won't be able to index them anyway. The robots.txt file is only for indexable (read - non-password and/or IP protected) folder(s)/file(s) that you want to forbid indexing to.A simple Googleing for "robots.txt" reveals this page as well as many others for information.

Link to comment
Share on other sites

After all, how will the robot be able to get through your password protection? It is just like a normal client, but automatically controlled.

Link to comment
Share on other sites

The reason why iwas talking about password protection is because all the tutorials i have seen show you how to limit the robots form going inside certain directories,i really dont care where they go. taht is what i meant, thank you all for the help.I would like to see teh content of an actuall ronot.txt file.

Link to comment
Share on other sites

First, don't trust anything - be it password protection or the rules the robots are supposed to obey. Having said that you are going in the right direction. You have already password protected unwanted areas - perfect. Now for your robots.txt file.Here is a quick link to reference for more detail, but basically, I would make sure that the folders (if web access) that are password protected also be included in your robots.txt file - redundancy doesn't hurt.For example:robots.txt

## robots.txt for http://www.yourdomain.com/## $Id: robots.txt,v 1.26 2006/02/19 00:07:43 DTV Exp $## For use by yourdomain.com# Exabot - exava.com# Do not allow this bot - doesn't follow rules all the timeUser-agent: ExabotDisallow:# flag folders to not be indexedUser-agent: *Disallow:Disallow: /includesDisallow: /adwordsDisallow: /filesDisallow: /forsaleDisallow: /samplesDisallow: /tasksDisallow: /tools

This will allow all bots (except exava.com) to access the entire site except the folders listed there. I have "include" files that I do not want indexed since the content already gets pulled into the site - I want to be certain that a bot doesn't misrepresent repetitive information. Other folders are listed for other reasons but it gives you an idea how you would include the exclusion of any web accessible folder that you have password protected.Here is Google's robots.txt file - almost tempting to explore those directories. (anything ending in a "?" is a functioning search tool - did you know about this one: http://www.google.com/patents?):)

Link to comment
Share on other sites

Not even that :) all you need is

User-agent: *Disallow: /folderDisallow: /anotherfolderDisallow: /etc

Anything starting with # is a comment, and is unnecessary.

Link to comment
Share on other sites

If i copy paste this into a robot.txt file, change yourdomain.com to my domain and the disallow names to the directories i would like to disallow. Would that be considered as done.or what else do i need to do?
Yes, you would be done. There would be nothing else to do on your end.And yes, anything after the pound symbol (#) is a comment - its always good to comment your code - I would not recommend removing it as its a good practice -but technically speaking you don't need all that.
Link to comment
Share on other sites

Here is what i have, anything else i need to do.

# robots.txt for http://www.i-s-a-f-e.com/## $Id: robots.txt,v 1.26 2006/02/19 00:07:43 DTV Exp $## For use by i-s-a-f-e.com# Exabot - exava.com# Do not allow this bot - doesn't follow rules all the timeUser-agent: ExabotDisallow:# flag folders to not be indexedUser-agent: *Disallow:Disallow: /incDisallow: /imageDisallow: /imagesDisallow: /logsDisallow: /phpDisallow: /errorDisallow: /tmp

sorry for asking so many question, apparently these robot.txt files are a big deal.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...