Pandas: Concat Rows Of Strings Until Specifc Characters
I have a one column dataframe. The rows of that column contain dialogue that often span multiple rows. At the end of each person's dialogue line is the same combination of characte
Solution 1:
You can join
your values and split
on your delimiter to recreate your dataframe:
df = pd.DataFrame(
''.join(df.Words.values)
.split('&,,'), columns=['Words']
)
Words
0 hello world!
1 I woke up this morning and made some eggs.They...
2
This can result in empty values if the last column ends with &,,
, but it's easy to filter those rows:
df.loc[df.Words.ne('')]
Words
0 hello world!
1I woke up this morning and made some eggs.They...
Solution 2:
You could use df['Words'].str.endswith('&,,')
to find which rows end with &,,
, then use cumsum
to generate the desired group numbers (stored below in the row
column).
Once you have those group numbers, you can use pd.pivot_table
to reshape the DataFrame into the desired form:
import sys
import pandas as pd
pd.options.display.max_colwidth = sys.maxsize
df = pd.DataFrame({
'Words': ['hello world! &,,',
'I woke up this morning and made some eggs.',
'They tasted good. &,,']}, index=[1, 2, 3])
df['row'] = df['Words'].str.endswith('&,,').shift().fillna(0).cumsum() + 1
result = pd.pivot_table(df, index='row', values='Words', aggfunc=' '.join)
print(result)
yields
Words
row
1 hello world! &,,
2 I woke up this morning and made some eggs. They tasted good. &,,
Post a Comment for "Pandas: Concat Rows Of Strings Until Specifc Characters"