Regular expressions

Regular expressions can be used when defining goals such as Page view, Multi-step goal, JavaScript Event, as well as in segmentation based on conditions involving URLs (e.g., traffic sources).

Note

When setting up a “JavaScript event” goal, the regular expression must contain only the identifier value (without any domain or website protocol).

Example

If you want to track clicking a particular button with the ID that contains button or buy, you can specify the following condition: button|buy.

The expression is processed according to RE2 syntax and the following rules:

  • The regular expression is applied to the page’s full URL, including protocol and domain. For example, you can use the regular expression: ^http://.

  • The regular expression is applied twice: first to the original URL, and then to the URL with and without the www prefix. This means that the results do not depend on whether the www prefix is included in the domain.

  • The regular expression is applied to the decoded URL, in which URL escape codes (% sequences) are replaced with decoded characters (exception: character codes for /, &, =, ?, # are not replaced; for example, %2F will not be replaced with /). It is important to bear in mind that the plus sign (+) is replaced with a space when decoding. For example, the regular expression text=elephant will be processed, but text=%D1%81%D0%BB%D0%BE%D0%BD and text=%\w\w will not.

  • Punycode is not applied to Cyrillic URLs. For example, the regular expression ^http://ввв\.сайт\.рф/ will be processed, but ^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/ will not.

  • Before checking regular expressions, symbols such as ?, #, &, and dots (.) are removed from the end of the URL. For example, the URLs http://example.com/?, http://example.com/#, and http://example.com/?var=1& will be compared to http://example.com/, http://example.com/, and http://example.com/?var=1, respectively. If the user enters the URL http://example.com./, the regular expression \./$ will not be processed.

  • Quantifiers match the longest possible string when checking regular expressions.

  • The characters in URLs are case-sensitive.

Instructions on regular expressions

In the table below, a, b, c, d, and e are any characters, and n and m are whole positive integers.

Alternative variants

abc|de

Matches one of the variants: abc or de.

Character classes

[abc] or [a-c]

Matches any single character from those listed (or from the specified range).

[^abc] or [^a-c]

Matches any single character except those listed (or outside the specified range).

\d

Matches a digit. Equivalent to [0-9].

\D

Matches a non-digit. Equivalent to [^0-9].

\s

Matches a space. Equivalent to [\t\n\f\r ].

\S

Matches any character that is not a space. Equivalent to [^\t\n\f\r ].

\pL

Matches any Unicode character.

\w

Matches an uppercase or lowercase Latin letter, number, or underscore.

When working with Unicode characters, use the \pL class instead of \w.

\W

Matches any character that is not an uppercase or lowercase Latin letter, number, or underscore.

When working with Unicode characters, use the \pL class instead of \w.

Number of occurrences (quantifiers)

a*

Matches the character a repeated 0 or more times (the longest of possible sequences is selected).

a+

Matches the character a repeated 1 or more times (the longest of possible sequences is selected).

a?

Matches the character a repeated 0 times or 1 time (priority is given to the character's occurrence).

a{n,m}

Matches the character a repeated no less than n times and no more than m times (the longest of possible sequences is selected).

a{n,}

Matches the character a repeated no less than n times (the longest of possible sequences is selected).

a{n}

Matches the character a repeated exactly n times.

a*?

Matches the character a repeated 0 or more times (the shortest of possible sequences is selected).

a+?

Matches the character a repeated 1 or more times (the shortest of possible sequences is selected).

a??

Matches the character a repeated 0 times or 1 time (priority is given to the character's absence).

a{n,m}?

Matches the character a repeated no less than n and no more than m times (the shortest of possible sequences is selected).

a{n,}?

Matches the character a repeated no less than n times (the shortest of possible sequences is selected).

Position within the string

^

Matches the beginning of the string.

$

Matches the end of the string.

\b

Matches a word boundary — the position between an alphanumeric character (\w) and a non-alphanumeric character (\W).

\B

Matches the absence of a word boundary. Defined through the classes \w and \W.

Escape sequences

\

A backslash before one of the special characters `[ ] \ ^ $ .

? * + ( ) { }` means that this character should be interpreted literally, not as a metacharacter.

Example: \$ corresponds to the dollar sign.

\Q...\E

All special characters in the interval between \Q and \E are interpreted as regular characters.

Chat with us

Write an email

Please note: Our support team will never initiate a call to you. Do not follow any instructions of people who call you and introduce themselves as the Yandex Metrica support team.