How To Remove Spaces In Between Characters Without Removing All Spaces In A Dataframe?
Solution 1:
This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.
The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.
import re
import pandas as pd
s = pd.Series(['V e r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])
regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')
0 Very calm
1 Keen and analytical
2 Rash and careless
3 Always joyful
dtype: object
This regex effectively says:
Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.
Post a Comment for "How To Remove Spaces In Between Characters Without Removing All Spaces In A Dataframe?"