How To Remove Spaces In Between Characters Without Removing All Spaces In A Dataframe?

March 31, 2024 Post a Comment

Lets say I have a dataframe like this: ID Name Description 0 Manny V e r y calm 1 Joey Keen and a n a l y t i c a l 2 Lisa R a s h and carel

Solution 1:

This is a tricky problem, but one approach that may get you most of the way there is to use negative and positive lookbehinds/lookaheads to encode a few basic rules.

The following example would likely work well enough given what you've described. It will incorrectly combine characters from consecutive "real" words that have been exploded into separated characters, but if that's rare this will probably be fine. You could add additional rules to cover more edge cases.

import re
import pandas as pd

s = pd.Series(['V e  r y calm', 'Keen and a n a l y t i c a l',
'R a s h and careless', 'Always joyful'])

regex = re.compile('(?<![a-zA-Z]{2})(?<=[a-zA-Z]{1}) +(?=[a-zA-Z] |.$)')
s.str.replace(regex, '')

0              Very calm
1    Keen and analytical
2      Rash and careless
3          Always joyful
dtype: object

This regex effectively says:

Baca Juga

Look for sequences of spaces and replace spaces, but only if there is one letter before them. If there are two letters, don't do anything (i.e., a 2-letter word). But more specifically, actually only replace a space if there is a letter after the last space in the sequence, or any character that terminates the string.

Learn Python Programming

How To Remove Spaces In Between Characters Without Removing All Spaces In A Dataframe?

Solution 1:

Post a Comment for "How To Remove Spaces In Between Characters Without Removing All Spaces In A Dataframe?"