Python Regex Negative Lookbehind Match Without Fixed Width
Solution 1:
You may use a hack that consists in capturing php
or python
before the expected match, and if the group is not empty (if it matched), discard the current match, else, the match is valid.
See
pattern = re.compile(r"(?:(php|python).*?)?((?:\d+\s?[^\W\d_]+\s?)?\d{2}\s?\d{3}\s?\w.+)")
The pattern contains 2 capturing groups:
(?:(php|python).*?)?
- the last?
makes this group optional, it matches and captures into Group 1php
orpython
, and then 0+ chars, as few as possible((?:\d+\s?[^\W\d_]+\s?)?\d{2}\s?\d{3}\s?\w.+)
- this is Group 2 that is basically your pattern with no redundand groups.
If Group 1 matches, we need to return an empty result, else, Group 2 value:
def callback(v):
m = pattern.search(v)
if m andnot m.group(1):
return m.group(2)
return""
aa["test"].apply(lambda x: callback(x))
Result:
0 45 python 00222 sometext
1
2
3 45 python 50000 sm
Solution 2:
As negative lookbehind must be of fixed length, you have to use negative lookahead, anchored to the start of string, checking the part before the first digit.
It should include:
- A sequence of non-digits (possibly empty).
- Either of your "forbidden" strings.
This way, if the string to check contains python or phpbefore the first digit, this lookahead will fail, preventing this string from further processing.
Because of the ^
anchor, the rest of regex must first match a sequence
of non-digits (what is before "DD+" part) and then there should be your
regex.
So the regex to use is as follows:
^(?!\D*(?:python|php))\D*(\d+)\s?([^\W\d_]+)\s?(\d{2}\s?\d{3})\s?(\w+)
Details:
^(?!
- Start of string and negative lookahead for:\D*
- A sequence of non-digits (may be empty).(?:python|php)
- Either of the "forbidden" strings, as a non-capturing group (no need to capture it).
)
- End of negative lookahead.\D*
- A sequence of non-digits (before what you want to match).(\d+)\s?
- The first sequence of digits + optional space.([^\W\d_]+)\s?
- Some text No 1 + optional space.(\d{2}\s?\d{3})\s?
- The second sequence of digits (with optional space in the middle) + optional space.(\w+)
- Some text No 2.
The advantage of my solution over the other is that you are free from checking whether the first group matched. Here you get only "positive" cases, which do not require any check.
For a working example see https://regex101.com/r/gl9nWx/1
Post a Comment for "Python Regex Negative Lookbehind Match Without Fixed Width"