elixir - Hashtags extraction using regex -


test string:

str = "#www #soulmusic #50_shades_of_blue # ##worldwideweb        #okie_dokkie #fr!ends #!alpacino #wonderfulride        #good#club #rhônealpes #trèsbon #øypålandet http://example.com/#comment        #moretags #www nobody #h3y!boy #email" 

this tried:

string.split(str, ~r/\b(#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+)/, trim: true,              include_captures: true) 

but not exclude hashtag in url receive:

["#www", " ", "#soulmusic", " ", "#50_shades_of_blue", " # #", "#worldwideweb", " ", "#okie_dokkie", " ", "#fr", "!ends #!alpacino ", "#wonderfulride", " ", "#good", "#club ", "#rhônealpes", " ", "#trèsbon", " ", "#øypålandet", " http://example.com/", "#comment", " ", "#moretags", " ", "#www", " nobody ", "#h3y", "!boy ", "#email"] 

what aim get:

["#www", "#soulmusic", "#50_shades_of_blue", "#worldwide", "#okie_dokkie", "#fr", "wonderfulride", "#good", "#rhônealpes", "#trèsbon", "#øypålandet", "#moretags", "#www",  "#h3y", "#email"] 

any on appreciated.

if need matches, you're looking regex.scan/2:

iex(1)> str = "#www #soulmusic #50_shades_of_blue # ##worldwideweb ...(1)>       #okie_dokkie #fr!ends #!alpacino #wonderfulride ...(1)>       #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment ...(1)>       #moretags #www nobody #email" "#www #soulmusic #50_shades_of_blue # ##worldwideweb \n      #okie_dokkie #fr!ends #!alpacino #wonderfulride \n      #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment \n      #moretags #www nobody #email" iex(2)> regex.scan(~r/\b#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+/, str) [["#www"], ["#soulmusic"], ["#50_shades_of_blue"], ["#worldwideweb"],  ["#okie_dokkie"], ["#fr"], ["#wonderfulride"], ["#good"], ["#rhônealpes"],  ["#trèsbon"], ["#gøypålandet"], ["#comment"], ["#moretags"], ["#www"],  ["#email"]] 

this return list of lists. can flatten list of strings using enum.concat/1:

iex(3)> regex.scan(~r/\b#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+/, str) |> enum.concat ["#www", "#soulmusic", "#50_shades_of_blue", "#worldwideweb", "#okie_dokkie",  "#fr", "#wonderfulride", "#good", "#rhônealpes", "#trèsbon",  "#gøypålandet", "#comment", "#moretags", "#www", "#email"] 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -