Python word match -
i have list of urls , i'm trying filter them using specific key words word1 , word2, , list of stop words [stop1, stop2, stop3]. there way filter links without using many if conditions? got proper output when used if condition on each stop word, doesn't feasible option. following brute force method:
for link in url: if word1 or word2 in link: if stop1 not in link: if stop2 not in link: if stop3 not in link: links.append(link)
here's couple of options consider if in situation.
you can use list comprehension built in any
, all
functions filter out unwanted urls list:
urls = ['http://somewebsite.tld/word', 'http://somewebsite.tld/word1', 'http://somewebsite.tld/word1/stop3', 'http://somewebsite.tld/word2', 'http://somewebsite.tld/word2/stop2', 'http://somewebsite.tld/word3', 'http://somewebsite.tld/stop3/word1', 'http://somewebsite.tld/stop4/word1'] includes = ['word1', 'word2'] excludes = ['stop1', 'stop2', 'stop3'] filtered_url_list = [url url in urls if any(include in url include in includes) if all(exclude not in url exclude in excludes)]
or can make function takes 1 url argument, , returns true
urls want keep , false
ones don't, pass function along unfiltered list of urls built in filter
function:
def urlfilter(url): includes = ['word1', 'word2'] excludes = ['stop1', 'stop2', 'stop3'] include in includes: if include in url: exclude in excludes: if exclude in url: return false else: return true urls = ['http://somewebsite.tld/word', 'http://somewebsite.tld/word1', 'http://somewebsite.tld/word1/stop3', 'http://somewebsite.tld/word2', 'http://somewebsite.tld/word2/stop2', 'http://somewebsite.tld/word3', 'http://somewebsite.tld/stop3/word1', 'http://somewebsite.tld/stop4/word1'] filtered_url_list = filter(urlfilter, urls)
Comments
Post a Comment