Skip to content Skip to sidebar Skip to footer

Pandas: Concat Rows Of Strings Until Specifc Characters

I have a one column dataframe. The rows of that column contain dialogue that often span multiple rows. At the end of each person's dialogue line is the same combination of characte

Solution 1:

You can join your values and split on your delimiter to recreate your dataframe:

df = pd.DataFrame(
    ''.join(df.Words.values)
    .split('&,,'), columns=['Words']
)

                                               Words
0                                      hello world!
1  I woke up this morning and made some eggs.They...
2

This can result in empty values if the last column ends with &,,, but it's easy to filter those rows:

df.loc[df.Words.ne('')]

                                               Words
0                                      hello world!
1I woke up this morning and made some eggs.They...

Solution 2:

You could use df['Words'].str.endswith('&,,') to find which rows end with &,,, then use cumsum to generate the desired group numbers (stored below in the row column). Once you have those group numbers, you can use pd.pivot_table to reshape the DataFrame into the desired form:

import sys
import pandas as pd
pd.options.display.max_colwidth = sys.maxsize

df = pd.DataFrame({
   'Words': ['hello world! &,,',
             'I woke up this morning and made some eggs.',
             'They tasted good. &,,']}, index=[1, 2, 3])

df['row'] = df['Words'].str.endswith('&,,').shift().fillna(0).cumsum() + 1
result = pd.pivot_table(df, index='row', values='Words', aggfunc=' '.join)
print(result)

yields

                                                                Words
row                                                                  
1                                                    hello world! &,,
2    I woke up this morning and made some eggs. They tasted good. &,,

Post a Comment for "Pandas: Concat Rows Of Strings Until Specifc Characters"