Skip to content Skip to sidebar Skip to footer

Python Pandas: Flatten With Arrays In Column

I have a pandas Data Frame having one column containing arrays. I'd like to 'flatten' it by repeating the values of the other columns for each element of the arrays. I succeed to m

Solution 1:

You need numpy.repeat with str.len for creating columns x and y and for z use this solution:

import pandas as pd
import numpy as np
from  itertools import chain

df = pd.DataFrame({
        "x": np.repeat(toConvert.x.values, toConvert.z.str.len()),
        "y": np.repeat(toConvert.y.values, toConvert.z.str.len()),
        "z": list(chain.from_iterable(toConvert.z))})

print (df)          
   x   y    z
01101011110102211010332202014220202

Solution 2:

Here's a NumPy based solution -

np.column_stack((toConvert[['x','y']].values.\
     repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))

Sample run -

In [78]: toConvert
Out[78]: 
   x   y                z
0110  (101, 102, 103)
1220       (201, 202)

In [79]: np.column_stack((toConvert[['x','y']].values.\
    ...:      repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]: 
array([[  1,  10, 101],
       [  1,  10, 102],
       [  1,  10, 103],
       [  2,  20, 201],
       [  2,  20, 202]])

Post a Comment for "Python Pandas: Flatten With Arrays In Column"