Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.
The robots.txt file is a commonly overlooked tool in the SEO toolbox. Despite being incredibly simple, time and again I see sites with it missing or badly implemented. I have written a short guide here explaining its uses.
What is the robots.txt file?
The robots.txt file is a small little text file which resides in the web root of a website. It's purpose is to serve as a behaviour guideline for search engine spiders. In essence it tells googlebot (and other spiders) what pages of a website is it should and should not index for use in web searches. This makes it very important when you are trying to do SEO.
Before I go on I want to discuss an often overlooked and misunderstood aspect of the robots.txt file which is security. The robots.txt file is a guideline only and web spiders can and often do ignore it. Even the ones that do obey it often have different interprepations of how the rules work.
It is for this reason that it is a very bad idea to list sensitive directories in the robots.txt file as an attacker can actually see what directories you do not want indexed to hone an attack. I personally recommend a policy of denying access to everything (without naming the resources being denied) and then giving access to named resources instead of giving access to everything and denying access to named resources.
Creating the robots.txt File
The first step in creating your robots.txt file is simply a case of creating a blank text file named robots.txt and placing it in the webroot of your site. It is then a simple matter of visiting 'http://yoursitedomain/robots.txt' to see if it is visible.
Once you have created your robots.txt file the first thing we do is write a comment. The robots.txt file uses the '#' symbol for starting comments. A comment can appear on its own line and appended to a code directive.
# Comments appear after the "#" symbol at the start of a line,
User-agent: * # or after a directive