Skip to content Skip to sidebar Skip to footer

Pandas: Groupby Return Error After Pd Cuts

I have a dataframe of Age and Marital_Status. The age is int and Marital_Status is string with 8 unique string eg: Married, Single etc. After dividing into age group of interval 10

Solution 1:

Here is problem (maybe bug?) in DataFrame.groupby, default parameter is observed=False, but you need working only with existing categoricals:

observed bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Sample:

df = pd.DataFrame({'Marital_Status':['stat1'] * 50,
                   'Age': range(50)})

df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
# print (df)print (df.groupby(['age_group'], observed=True)['Marital_Status'].value_counts())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
Name: Marital_Status, dtype: int64

In alternative solution better is possible check difference:

print (df.groupby(['age_group', 'Marital_Status']).size())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
(50, 60]   stat1              0
(60, 70]   stat1              0
(70, 80]   stat1              0
dtype: int64print (df.groupby(['age_group', 'Marital_Status'], observed=True).size())
age_group  Marital_Status
(10, 20]   stat1             10
(20, 30]   stat1             10
(30, 40]   stat1             10
(40, 50]   stat1              9
dtype: int64

Post a Comment for "Pandas: Groupby Return Error After Pd Cuts"