Regex on domain and negation against language subfolders -
let's domains are:
- www.test.com
- www.test.com/en-gb
- www.test.com/cn-cn
these language sites, first main english site. in google analytics want set filter show me traffic of first (us) domain. this, think:
^\/(en-gb|cn-cn).*$
if exclude request uri filter pattern, should view en-us domain. however, i'm interested in understanding regex better here test data , code trying out on http://www.regextester.com/
regular expression:
^\/(en-gb|cn-cn).*$
test string
/cn-cn/about /cn-cn/about/ /cn-cn /cn-cn/about/test /en-gb/ /en-gb /en-gb-test/ /en-gb/aboutus/ /en-gb?q=1 /en-gb/?q=1 /about-us /test?q=1 /aword/me/ /three /about/en-gb/ /about/en-gb-test/ /test-yes/ /test/me/ /hello/world/
my questions:
if try out, you'll notice /en-gb-test/ matched regex. how avoid this?
also, let's wanted have rule negate whole option. rather telling google analytics "exclude", curious how write opposite of same rule. basically, catch urls not in /en-gb , /cn-cn sub-folders.
thanks in advance!
you may stop regex matching en-gb-test
making sure may /
or ?
after or end of string
^\/(en-gb|cn-cn)([\/?]|$)
see regex demo. if need rest of string, add .*
after [\/?]
: ^\/(en-gb|cn-cn)([\/?]|$)
.
details:
^
- start of string\/
-/
(note not need escape/
in ga regex)(en-gb|cn-cn)
- capturing group 2 alternatives, eitheren-gb
orcn-cn
([\/?]|$)
- capturing group 2 alternatives:?
or/
or end of string.
in re2 regex, cannot use lookaheads crucial when need match other else. ^(?!\/(en-gb|cn-cn)([\/?]|$)).*
, not possible re2.
Comments
Post a Comment