regex - Extract from string latest substring with alpa character with or not other numbers on PHP -
i'm trying normalize data series , find problem in 1 of fields.
from data, supposed should normalized , return string of no more 14 characters, composed of 2 alpha characters , rest numbers
but sometime between 90 million items, there may have 1 or 2 additional items start undetermined or serializable alpha character, , number (or not)
normalized values (aa + 000000000000)
ep0123456789 es123456 fr1234567890123
incorrect values (aa + 00000000 + a) or (aa + 00000000 + a0)
ep1025364758a fr1920393874b1 ch172637488858a cn727363525252w2
a
-> alpha
0
-> number (positive)
for extract normalized values (aa00000000 code, a0 kindcode) use bit complish code. think there're best algorithm
$pat = 'fr1920393874b1'; if (preg_match("/[a-z]/i",substr($pat, -2))) { $fail = substr($pat, -2); if (preg_match('/[\a-za-z]+/', $fail, $match, preg_offset_capture)) { $kind = substr($fail,$match[0][1]); // b1 $pat = str_replace($kind,'',$pat); // fr1920393874 } }
so, need 2 values out of input string:
- the first 2 alpha chars , 1 or more digits after them
- the rest of string
so, fr1920393874b1
, want fr1920393874
, b1
separate values.
it turns out need split codes rest of string , 2 values in output.
use ^([a-za-z]{2}\d+)(.*)
pattern:
$pat = 'fr1920393874b1'; if (preg_match('~^([a-za-z]{2}\d+)(.*)~', $pat, $m)) { echo "val: " . $m[1] . "\nkind: " . $m[2]; }
see php demo
details:
^
- start of string([a-za-z]{2}\d+)
- capturing group 1 ($m[1]
): 2 ascii letters , 1+ digits(.*)
- capturing group 2 ($m[2]
): 0+ chars other line break chars, many possible (the rest of line)
Comments
Post a Comment