Disallow and Allow directives

Disallow

Use this directive to prohibit crawling of sections or individual pages of a site. For example:

  • Pages that contain confidential data.
  • Pages with site search results.
  • Site traffic statistics.
  • Duplicate pages.
  • Various logs.
  • Database service pages.

Note

When selecting a directive for pages that need to be left out from the search, if their addresses contain GET parameters, it's recommended to use the Clean-param directive rather than Disallow. If you use Disallow, you may not be able to identify duplicate link URLs without the parameter and send some metrics of forbidden pages.

Examples:

User-agent: Yandex
Disallow: / # prohibits crawling of the entire site

User-agent: Yandex
Disallow: /catalogue # prohibits crawling of pages whose addresses start with /catalogue

User-agent: Yandex
Disallow: /page? # prohibits crawling of pages whose URLs contain parameters

Allow

This directive allows crawling of sections or individual pages of a site.

Examples:

User-agent: Yandex
Allow: /cgi-bin
Disallow: /
# prohibits downloading of everything except pages 
# starting with “/cgi-bin”
User-agent: Yandex
Allow: /file.xml
# allows downloading of the file.xml file

Note

Empty line breaks aren't allowed between the User-agent, Disallow and Allow directives.

Combining directives

The Allow and Disallow directives from the corresponding User-agent block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect the way they are used by the robot.

Note

If there is a conflict between two directives with prefixes of the same length, the Allow directive takes precedence.

# Source robots.txt:
User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Sorted robots.txt:
User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading of pages starting with “/catalog”,
# but allows downloading of pages starting with ’/catalog/auto”.

Common example:

User-agent: Yandex
Allow: /archive
Disallow: /
# allows everything containing “/archive”, the rest is prohibited

User-agent: Yandex
Allow: /obsolete/private/*.html$ # allows html files
                                 # in “/obsolete/private/...”
Disallow: /*.php$  # prohibits all '*.php” on this site
Disallow: /*/private/ # prohibits all subpaths containing
                      # “/private/”, but Allow above negates
                      # part of this prohibition
Disallow: /*/old/*.zip$ # prohibits all '*.zip” files containing
                        # in “/old/”

User-agent: Yandex
Disallow: /add.php?*user= 
# prohibits all “add.php?' scripts with the “user” parameter

Allow and Disallow directives without parameters

If directives don't contain parameters, the robot handles the data as follows:

User-agent: Yandex
Disallow: # same as Allow: /

User-agent: Yandex
Allow: # isn’t taken into account by the bot

Using the special characters * and $

You can use special characters when specifying the paths of the Allow and Disallow directives * and $ to set certain regular expressions.

The * character indicates any sequence of characters (or none). Examples:

User-agent: Yandex
Disallow: /cgi-bin/*.aspx # prohibits “/cgi-bin/example.aspx”
                          # and “/cgi-bin/private/test.aspx”
Disallow: /*private # prohibits both “/private”
                    # and “/cgi-bin/private”

By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent: Yandex
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with “/cgi-bin”
Disallow: /cgi-bin # the same

To cancel * at the end of the rule, use the $ character, for example:

User-agent: Yandex
Disallow: /example$ # prohibits “/example”, 
                    # but allows “/example.html”
User-agent: Yandex
Disallow: /example # prohibits both “/example” 
                   # and “/example.html”

The $ character doesn't forbid the * at the end, that is:

User-agent: Yandex
Disallow: /example$  # only prohibits “/example”
Disallow: /example*$ # same as “Disallow: /example” 
                     # prohibits both /example.html and /example

Processing the # character

According to the standard, you should insert a blank line before every User-agent directive. The # character designates commentary. Everything following this character, up to the first line break, is disregarded.

Pages with addresses like https://example.com/page#part_1 are not indexed by the search bot and will be crawled at the https://example.com/page address. Therefore, it is okay to specify the page address without the anchor in the directive.

If you do not take this feature into account and write a disallow directive with the # character, it may block the entire site from indexing. For example, a directive in the Disallow: /# format will be interpreted by the search system as Disallow: /, i.e., a complete ban on indexing.

Examples of how directives are interpreted

User-agent: Yandex 
Allow: /
Disallow: /
# everything is allowed

User-agent: Yandex 
Allow: /$
Disallow: /
# everything is prohibited except the home page

User-agent: Yandex
Disallow: /private*html
# prohibits “/private*html”, 
#, “/private/test.html”, “/private/html/test.aspx”, etc.

User-agent: Yandex
Disallow: /private$
# only prohibits “/private”

User-agent: *
Disallow: /
User-agent: Yandex
Allow: /
# since the Yandex bot 
# selects entries that include “User-agent:', 
# everything is allowed
Contact Support



You can also go to