Filename extension .txt +
File formats category - v  e   edit
Smallwikipedialogo.png Wikipedia has an article related to:

The robots exclusion protocol is a convention to prevent unwanted web spiders and robots from accessing a website.

To use this protocol, a file must be named robots.txt and must be placed in the root directory of the website (for example,


The file is basically a text file containing a list of groups of 'User-agent's and 'Disallow' clauses. Each pair describes how specific crawlers may access the site.

A group begins with a "User-agent" clause specifying the type of robots. You may use the asterisk (*) as a wildcard.

  • To specify a bot named "Wahoo", use:
User-agent: Wahoo
  • To mark all bots, use:
User-agent: *
  • To specify bots starting with "Gougle", use:
User-agent: Gougle*

After a User-agent clause, you may use one or more "Disallow" clauses to specify the directories you want to "hide" from the bot.

  • To allow the bot to access all files, specify a blank Disallow clause:
  • To disallow the bot from all files, use:
Disallow: /
  • Disallow the bot from specific directories:
Disallow: /cgi-bin/
Disallow: /private/
  • Files may also be excluded.
  • Comments are preceded with the number (#) symbol:
# A line may start with a comment
User-agent: Wahougle* # You can also place comments here
Disallow: /private # This disallows all bots from entering all directories
  • Groups are processed in order from top to bottom. For example, if a bot is included in the first group, following groups should not process that bot anymore. (Thus you would usually put a User-agent: * at the bottom.)

Extended syntaxEdit

  • A "Crawl-delay" clause causes the bot to wait for the specified number of seconds before requesting again from the same website:
Crawl-delay: 5
  • An "Allow" clause counteracts a following "Disallow" clause. The following allows access to the musicfiles.html file but not to other files in the music directory:
Allow: /music/musicfiles.html
Disallow: /music