elixir - Hashtags extraction using regex -
test string:
str = "#www #soulmusic #50_shades_of_blue # ##worldwideweb #okie_dokkie #fr!ends #!alpacino #wonderfulride #good#club #rhônealpes #trèsbon #øypålandet http://example.com/#comment #moretags #www nobody #h3y!boy #email"
this tried:
string.split(str, ~r/\b(#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+)/, trim: true, include_captures: true)
but not exclude hashtag in url receive:
["#www", " ", "#soulmusic", " ", "#50_shades_of_blue", " # #", "#worldwideweb", " ", "#okie_dokkie", " ", "#fr", "!ends #!alpacino ", "#wonderfulride", " ", "#good", "#club ", "#rhônealpes", " ", "#trèsbon", " ", "#øypålandet", " http://example.com/", "#comment", " ", "#moretags", " ", "#www", " nobody ", "#h3y", "!boy ", "#email"]
what aim get:
["#www", "#soulmusic", "#50_shades_of_blue", "#worldwide", "#okie_dokkie", "#fr", "wonderfulride", "#good", "#rhônealpes", "#trèsbon", "#øypålandet", "#moretags", "#www", "#h3y", "#email"]
any on appreciated.
if need matches, you're looking regex.scan/2
:
iex(1)> str = "#www #soulmusic #50_shades_of_blue # ##worldwideweb ...(1)> #okie_dokkie #fr!ends #!alpacino #wonderfulride ...(1)> #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment ...(1)> #moretags #www nobody #email" "#www #soulmusic #50_shades_of_blue # ##worldwideweb \n #okie_dokkie #fr!ends #!alpacino #wonderfulride \n #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment \n #moretags #www nobody #email" iex(2)> regex.scan(~r/\b#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+/, str) [["#www"], ["#soulmusic"], ["#50_shades_of_blue"], ["#worldwideweb"], ["#okie_dokkie"], ["#fr"], ["#wonderfulride"], ["#good"], ["#rhônealpes"], ["#trèsbon"], ["#gøypålandet"], ["#comment"], ["#moretags"], ["#www"], ["#email"]]
this return list of lists. can flatten list of strings using enum.concat/1
:
iex(3)> regex.scan(~r/\b#[á-úÁ-Úä-üÄ-Üa-za-z0-9_]+/, str) |> enum.concat ["#www", "#soulmusic", "#50_shades_of_blue", "#worldwideweb", "#okie_dokkie", "#fr", "#wonderfulride", "#good", "#rhônealpes", "#trèsbon", "#gøypålandet", "#comment", "#moretags", "#www", "#email"]
Comments
Post a Comment