Disallow and Allow directives

  1. Disallow
  2. Allow
  3. Combining directives
  4. Allow and Disallow directives without parameters
  5. Using the special characters * and $
  6. Examples of how directives are interpreted

Disallow

Use this directive to prohibit indexing site sections or individual pages. Examples:
  • Pages that contain confidential data.
  • Pages with site search results.
  • Site traffic statistics.
  • Duplicate pages.
  • Various logs.
  • Database service pages.
Note. When selecting a directive for pages that need to be excluded from the search if their addresses contain GET parameters, we recommend using the Clean-param directive rather than Disallow. If you use Disallow, you may not be able to identify duplicate link URLs without the parameter and send some metrics of forbidden pages, such as reference metrics.

Examples:

User-agent: Yandex
Disallow: / # prohibits crawling the entire site

User-agent: Yandex
Disallow: / catalog # prohibits crawling pages that start with /catalog

User-agent: Yandex
Disallow: /page? # Prohibits crawling the pages with a URL that contains parameters

Allow

This directive allows indexing site sections or individual pages.

Examples:

User-agent: Yandex
Allow: /cgi-bin
Disallow: /
# prohibits downloading anything except for pages
# starting with '/cgi-bin'
User-agent: Yandex
Allow: /file.xml
# allows downloading file.xml
Note. Empty line breaks aren't allowed between the User-agent, Disallow and Allow directives.

Combining directives

The Allow and Disallow directives from the corresponding User-agent block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect the way they are used by the robot.

Note. If there is a conflict between two directives with prefixes of the same length, the Allow directive takes precedence.
# Source robots.txt:
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Sorted robots.txt:
User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading pages that start with '/catalog',
# but allows downloading pages that start with '/catalog/auto'.

Common example:

User-agent: Yandex
Allow: /archive
Disallow: /
# allows everything that contains '/ archive', the rest is prohibited

User-agent: Yandex
Allow: /obsolete/private/*.html$ # allows HTML files
                                 # in the '/absolute/private/...' path
Disallow: /*.php$  # prohibits all '*.php' on the site
Disallow: /*/private/ # prohibits all subpaths containing
                      # '/private/', but the Allow above negates
                      # a part of this prohibition
Disallow: /*/old/*.zip$ # prohibits all '*.zip' files, containing
                        # '/old/' in the path

User-agent: Yandex
Disallow: /add.php?*user=
# prohibits all 'add.php?' scripts with the 'user' option

Allow and Disallow directives without parameters

If directives don't contain parameters, the robot handles the data as follows:

User-agent: Yandex
Disallow: # same as Allow: /

User-agent: Yandex
Allow: # isn't taken into account by the robot

Using the special characters * and $

You can use special characters when specifying the paths of the Allow and Disallow directives * and $ to set certain regular expressions.

The * character indicates any sequence of characters (or none). Examples:

User-agent: Yandex
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private',
                    # and '/cgi-bin/private'

By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent: Yandex
Disallow: /cgi-bin* # blocks access to pages
                    # that start with '/cgi-bin'
Disallow: /cgi-bin # the same

To cancel * at the end of the rule, use the $ character, for example:

User-agent: Yandex
Disallow: /example$ # prohibits '/example',
 # but doesn't prohibit '/example.html'
User-agent: Yandex
Disallow: /example # prohibits both '/example'
 # and '/example.html'

The $ character doesn't forbid the * at the end, that is:

User-agent: Yandex
Disallow: /example$  # only prohibits '/example'
Disallow: /example*$ # same as 'Disallow: /example' 
                     # prohibits /example.html and /example

Examples of how directives are interpreted

User-agent: Yandex
Allow: /
Disallow: /
# everything is allowed

User-agent: Yandex
Allow: /$
Disallow: /
# everything is forbidden except for the main page

User-agent: Yandex
Disallow: /private*html
# prohibits '/private*html',
# '/private/test.html', '/private/html/test.aspx', and so on

User-agent: Yandex
Disallow: /private$
# only prohibits '/private'

User-agent: *
Disallow: /
User-agent: Yandex
Allow: /
# since the Yandex robot
# selects entries that have 'User-agent:' in the string,
# everything is allowed