Regular expressions
Regular expressions can be used when defining goals such as Page view, Multi-step goal, JavaScript Event, as well as in segmentation based on conditions involving URLs (e.g., traffic sources).
Note
When setting up a “JavaScript event” goal, the regular expression must contain only the identifier value (without any domain or website protocol).
Example
If you want to track clicking a particular button with the ID that contains button
or buy
, you can specify the following condition: button|buy
.
The expression is processed according to RE2 syntax and the following rules:
-
The regular expression is applied to the page’s full URL, including protocol and domain. For example, you can use the regular expression:
^http://
. -
The regular expression is applied twice: first to the original URL, and then to the URL with and without the
www
prefix. This means that the results do not depend on whether thewww
prefix is included in the domain. -
The regular expression is applied to the decoded URL, in which URL escape codes (% sequences) are replaced with decoded characters (exception: character codes for
/
,&
,=
,?
,#
are not replaced; for example,%2F
will not be replaced with/
). It is important to bear in mind that the plus sign (+
) is replaced with a space when decoding. For example, the regular expressiontext=elephant
will be processed, buttext=%D1%81%D0%BB%D0%BE%D0%BD
andtext=%\w\w
will not. -
Punycode is not applied to Cyrillic URLs. For example, the regular expression
^http://ввв\.сайт\.рф/
will be processed, but^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/
will not. -
Before checking regular expressions, symbols such as ?, #, &, and dots (.) are removed from the end of the URL. For example, the URLs
http://example.com/?
,http://example.com/#
, andhttp://example.com/?var=1&
will be compared tohttp://example.com/
,http://example.com/
, andhttp://example.com/?var=1
, respectively. If the user enters the URLhttp://example.com./
, the regular expression\./$
will not be processed. -
Quantifiers match the longest possible string when checking regular expressions.
-
The characters in URLs are case-sensitive.
Instructions on regular expressions
In the table below, a
, b
, c
, d
, and e
are any characters, and n
and m
are whole positive integers.
Alternative variants |
||
abc|de |
Matches one of the variants: |
|
Character classes |
||
[abc] or [a-c] |
Matches any single character from those listed (or from the specified range). |
|
[^abc] or [^a-c] |
Matches any single character except those listed (or outside the specified range). |
|
\d |
Matches a digit. Equivalent to |
|
\D |
Matches a non-digit. Equivalent to |
|
\s |
Matches a space. Equivalent to |
|
\S |
Matches any character that is not a space. Equivalent to |
|
\pL |
Matches any Unicode character. |
|
\w |
Matches an uppercase or lowercase Latin letter, number, or underscore. When working with Unicode characters, use the |
|
\W |
Matches any character that is not an uppercase or lowercase Latin letter, number, or underscore. When working with Unicode characters, use the |
|
Number of occurrences (quantifiers) |
||
a* |
Matches the character |
|
a+ |
Matches the character |
|
a? |
Matches the character |
|
a{n,m} |
Matches the character |
|
a{n,} |
Matches the character |
|
a{n} |
Matches the character |
|
a*? |
Matches the character |
|
a+? |
Matches the character |
|
a?? |
Matches the character |
|
a{n,m}? |
Matches the character |
|
a{n,}? |
Matches the character |
|
Position within the string |
||
^ |
Matches the beginning of the string. |
|
$ |
Matches the end of the string. |
|
\b |
Matches a word boundary — the position between an alphanumeric character ( |
|
\B |
Matches the absence of a word boundary. Defined through the classes |
|
Escape sequences |
||
\ |
A backslash before one of the special characters `[ ] \ ^ $ . |
? * + ( ) { }` means that this character should be interpreted literally, not as a metacharacter. Example: |
\Q...\E |
All special characters in the interval between |
Useful links |
Online training |