WebSites-n-Services Google Sitemaps Knowledge Base Article 2

02/07/2006

robots.txt

Topics Covered:

Why have a robots.txt file?

How to create a robots.txt file: UNIX Line Terminators needed!

Format for robots.txt

robots.txt example

Special robots.txt examples

robots.txt validation

   

Why have a robots.txt file?

    A robots.txt file makes your website search engine friendly. You also need one unless you want to see errors in your Google Account for your website(s). The robots.txt file is a simple set of instructions for the search engines, telling bots where they can and can't go. Only one robots.txt file can be on a site. A robots.txt file that is incorrect can prevent search engines from entering, and subsequently spidering your site.

 

How to create a robots.txt file: UNIX Line Terminators needed!

    Create the file in a text file editor that supports formatting the file with Unix line terminators. It may work with other line terminators, there is conflicting information in regards to this. We know for a fact that Unix line terminators work for this file. The file also is to be encoded UTF-8. White space characters or comments are allowed but are strongly not recommended. A little information on line terminators http://en.wikipedia.org/wiki/Newline

   

Format for robots.txt

    Format of the file is best shown  here  (W3C).  Take note of the W3C robots.txt tips: URI's are case-sensitive, and "/robots.txt" string must be all lower-case. Blank lines are not permitted within a single record in the "robots.txt" file. Also, there must be at least one disallow field present for the file to be valid. If giving instructions to more than one bot, then see Special robots.txt examples below.

 

robots.txt example

    Sample robots.txt file:

User-agent: *
Disallow: /cgi-bin
Disallow: /bin

    The sample file instructions are for all robots <User-agent: *>; and it tells the bots they can not enter the directories /cgi-bin and /bin.

 

Special robots.txt examples:

For instructions to more than one bot, only one record per user agent is allowed, at least one Disallow field must be present, and there must be at least one blank line between the records. With all given examples, modify the directory names to your needs.

This example restricts Googlebot from entering your cgi-bin directory , and prevents Googlebot-Image from entering your image directory

User-agent: Googlebot
Disallow: /cgi-bin

User-agent: Googlebot-Image
Disallow: /images

 

This example restricts Googlebot-Image from entering your domain:.

User-agent: Googlebot-Image
Disallow: /

 

robots.txt validation 

    You can validate your robots.txt file to the Robots Exclusion Standard  here

 

Check for Updates to this Article

Click here if you have a correction for this page or updated information, please include the KB Number

Directory of KB Articles

Visit WebSites-n-Services