Regular expressions

can be used to filter the URL data in Yandex.Webmaster:

Expressions are parsed according to the RE2 syntax and the following rules:

  • The regular expression is applied to the entire URL of the page including the protocol and domain. For example, you can use the following regular expression: ^http://.
  • A regular expression is applied twice: to the original URL and the URL with the www prefix and without it. The presence of the www prefix in the domain doesn't affect the result of expression validation.
  • The regular expression is applied to the decoded URL where the URL codes (% sequences) are replaced with decoded characters. Exception: the codes for the /, &, =, ?, # characters aren't replaced. For example, %2F will not be replaced with /). Note that the + character is replaced with a space. For example, the regular expression text=elephant will be processed, but text=%D1%81%D0%BB%D0%BE%D0%BD and text =%\w\w won't.
  • Cyrillic URL doesn't use punycode. For example, the regular expression ^http://ввв\.сайт\.рф/ will be processed, but ^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/ won't.
  • Some characters are excluded from the regular expression before the check: ?, #, &, as well as the period (.). For example, for the URLs http://example.com/?, http://example.com/#, http://example.com/?var=1& are compared with http://example.com/, http://example.com/, http://example.com/?var=1 respectively. If the user enters the URL http://example.com./, the regular expression \./$ isn't processed.
  • The regular expression check takes into account the letter case of the URL characters.
  • In the checked expressions, quantifiers match as many characters as possible.

Regular expressions memo

In the table below, a, b, c, d, e are any characters, n, m are positive numbers.

Possible options
abc|de Matches one of the options: abc or de.
Classes of characters
[abc] or [a-c] Matches any (one) character of the list (or from the range).
[^abc] or [^a-c] Matches any (one) character except those listed (or those from the range).
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\s Matches a space character. Equivalent to [\t\n\f\r].
\S Matches a non-white-space character. Equivalent to [^\t\n\f\r].
\pL Matches any character.
\w

Matches any Latin letter of any case, digit or the underscore character.

When working with the Unicode characters, use the \pL class instead of \w.

\W

Matches any character other than a Latin letter of any case, a digit or an underscore.

When working with the Unicode characters, use the \pL class instead of \w.

Number of occurrences (quantifiers)
a* Matches the a character repeated 0 or more times (the longest possible sequence).
a+ Matches the a character repeated 1 or more times (the longest possible sequence).
a? Matches the a character repeated 0 or 1 time (the presence of the character is a priority).
a{n,m} Matches the a character repeated at least n times and not more than m times (the longest possible sequence).
a{n,} Matches the a character repeated at least n times (the longest possible sequence).
a{n} Matches the a character repeated n times.
a*? Matches the a character repeated 0 or more times (the shortest possible sequence).
a+? Matches the a character repeated 1 or more times (the shortest possible sequence).
a?? Matches the a character repeated 0 or 1 time (the presence of the character is a priority).
a{n,m}? Matches the a character repeated at least n times and not more than m times (the longest possible sequence).
a{n,}? Matches the a character repeated at least n times (the shortest possible sequence).
Position in the line:
^ Matches the beginning of a string.
$ Matches the end of a string.
\b

Matches the word boundary — the position between the alphanumeric character (\w) and non-alphanumeric (\W) character.

\B

Matches a non-word boundary. Defined through the \w and \W classes.

Escaping
\

Reverse slash before the [ ] \ ^ $ special character. | ? * + {} </ex> indicates that the character is not special and should be interpreted literally.

Example: \$ corresponds to the dollar sign.

\Q...\E All special characters between \Qand\E are interpreted as common characters.