WebSites-n-Services Google Sitemaps Knowledge Base Article 2
02/07/2006
robots.txt
Topics Covered:
Why have a robots.txt file?
How to create a robots.txt file: UNIX Line Terminators needed!
Format for robots.txt
robots.txt example
Special robots.txt examples
robots.txt validation
Why have a robots.txt file?
A robots.txt file makes your website search engine friendly. You also need one unless you want to see errors in your Google Account for your website(s). The robots.txt file is a simple set of instructions for the search engines, telling bots where they can and can't go. Only one robots.txt file can be on a site. A robots.txt file that is incorrect can prevent search engines from entering, and subsequently spidering your site.
How to create a robots.txt file: UNIX Line Terminators needed!
Create the file in a text file editor that supports formatting the file with Unix line terminators. It may work with other line terminators, there is conflicting information in regards to this. We know for a fact that Unix line terminators work for this file. The file also is to be encoded UTF-8. White space characters or comments are allowed but are strongly not recommended. A little information on line terminators http://en.wikipedia.org/wiki/Newline
Format for robots.txt
Format of the file is best shown here (W3C). Take note of the W3C robots.txt tips: URI's are case-sensitive, and "/robots.txt" string must be all lower-case. Blank lines are not permitted within a single record in the "robots.txt" file. Also, there must be at least one disallow field present for the file to be valid. If giving instructions to more than one bot, then see Special robots.txt examples below.
robots.txt example
Sample robots.txt file:
User-agent: *
Disallow: /cgi-bin
Disallow: /bin
The sample file instructions are for all robots <User-agent: *>; and it tells the bots they can not enter the directories /cgi-bin and /bin.
Special robots.txt examples:
For instructions to more than one bot, only one record per user agent is allowed, at least one Disallow field
must be present, and there must be at least one blank line between the
records. With all given examples, modify the directory names to your needs.
This example restricts Googlebot from entering your cgi-bin directory , and prevents Googlebot-Image from entering your image directory
User-agent: Googlebot
Disallow: /cgi-bin
User-agent: Googlebot-Image
Disallow: /images
This example restricts Googlebot-Image from entering your domain:.
User-agent: Googlebot-Image
Disallow: /
robots.txt validation
You can validate your robots.txt file to the Robots Exclusion Standard here
Check for Updates to this Article