python - Make variants based on alphabet-number junction in regex -
i have list of words, of single words, multi-word , words may or may not have digit characters.
an example -
word_list=['word', 'kap1','another word', 'another-1 word', 'another word 1']
i want identify single-word entries in list, of form -
alphabets*junction*digit(s)
where junction can either space, or hyphen or nothing. example in list above, kap1 qualifies (and no other entry does). now, having found entry, want create variants (based on junction) of entry, , add them list.
so example, having found kap1, want add kap 1 , kap-1 list.
i able write initial regex identifying entries -
word_numberregex=re.compile(r"^[a-za-z]+[ -]?\d+$")
but wanted know algorithm creation of variants, depending on junction.
use re, can capture matched patterns, , reformat customized separators:
word_list=['word', 'kap1','another word', 'another-1 word', 'another word 1'] import re p = r'([a-za-z]+)[- ]?([0-9]+)' [re.sub(p, r'\1{}\2'.format(sep), w) w in word_list if re.fullmatch(p, w) sep in ['', ' ', '-']] # ['kap1', 'kap 1', 'kap-1'] precompile pattern:
p = re.compile(r'([a-za-z]+)[- ]?([0-9]+)') [p.sub(r'\1{}\2'.format(sep), w) w in word_list if p.fullmatch(w) sep in ['', ' ', '-']] # ['kap1', 'kap 1', 'kap-1']
Comments
Post a Comment