Skip to content Skip to sidebar Skip to footer

Python Positive-lookbehind Split Variable-width

I though that I have set up the expression appropriately, but the split is not working as intended. c = re.compile(r'(?<=^\d\.\d{1,2})\s+'); for header in ['1.1 Introduction', '

Solution 1:

Lookbehinds in python cannot be of variable width, so your lookbehind is not valid.

You can use a capture group as a workaround:

c = re.compile(r'(^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
    print re.split(c, header)[1:] # Remove the first element because it's empty

Output:

['1.1', 'Introduction']
['1.42', 'Appendix']

Solution 2:

your error in the regex is in the part {1,2} because Lookbehinds need to be fixed-width, thus quantifiers are not allowed.

try this website to test your regex before you put it in code.

BUT in your case you don't need to use regex at all:

simply try this:

for header in ['1.1 Introduction', '1.42 Appendix']:
    print header.split(' ')

result:

['1.1', 'Introduction']
['1.42', 'Appendix']

hope this helps.


Solution 3:

My solution may look lame. But you are checking only two digits after dot. So, you can use two lookbehind.

c = re.compile(r'(?:(?<=^\d\.\d\d)|(?<=^\d\.\d))\s+');

Post a Comment for "Python Positive-lookbehind Split Variable-width"