Regex on domain and negation against language subfolders -


let's domains are:

  • www.test.com
  • www.test.com/en-gb
  • www.test.com/cn-cn

these language sites, first main english site. in google analytics want set filter show me traffic of first (us) domain. this, think:

^\/(en-gb|cn-cn).*$ 

if exclude request uri filter pattern, should view en-us domain. however, i'm interested in understanding regex better here test data , code trying out on http://www.regextester.com/

regular expression:

^\/(en-gb|cn-cn).*$ 

test string

/cn-cn/about /cn-cn/about/ /cn-cn /cn-cn/about/test /en-gb/ /en-gb /en-gb-test/ /en-gb/aboutus/ /en-gb?q=1 /en-gb/?q=1 /about-us /test?q=1 /aword/me/ /three /about/en-gb/ /about/en-gb-test/ /test-yes/ /test/me/ /hello/world/ 

my questions:

  1. if try out, you'll notice /en-gb-test/ matched regex. how avoid this?

  2. also, let's wanted have rule negate whole option. rather telling google analytics "exclude", curious how write opposite of same rule. basically, catch urls not in /en-gb , /cn-cn sub-folders.

thanks in advance!

you may stop regex matching en-gb-test making sure may / or ? after or end of string

^\/(en-gb|cn-cn)([\/?]|$) 

see regex demo. if need rest of string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).

details:

  • ^ - start of string
  • \/ - / (note not need escape / in ga regex)
  • (en-gb|cn-cn) - capturing group 2 alternatives, either en-gb or cn-cn
  • ([\/?]|$) - capturing group 2 alternatives: ? or / or end of string.

in re2 regex, cannot use lookaheads crucial when need match other else. ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, not possible re2.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -