Pandas: Groupby Return Error After Pd Cuts
I have a dataframe of Age and Marital_Status. The age is int and Marital_Status is string with 8 unique string eg: Married, Single etc. After dividing into age group of interval 10
Solution 1:
Here is problem (maybe bug?) in DataFrame.groupby
, default parameter is observed=False
, but you need working only with existing categoricals:
observed bool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Sample:
df = pd.DataFrame({'Marital_Status':['stat1'] * 50,
'Age': range(50)})
df['age_group'] = pd.cut(df.Age,bins=[10,20,30,40,50,60,70,80])
# print (df)print (df.groupby(['age_group'], observed=True)['Marital_Status'].value_counts())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
Name: Marital_Status, dtype: int64
In alternative solution better is possible check difference:
print (df.groupby(['age_group', 'Marital_Status']).size())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
(50, 60] stat1 0
(60, 70] stat1 0
(70, 80] stat1 0
dtype: int64print (df.groupby(['age_group', 'Marital_Status'], observed=True).size())
age_group Marital_Status
(10, 20] stat1 10
(20, 30] stat1 10
(30, 40] stat1 10
(40, 50] stat1 9
dtype: int64
Post a Comment for "Pandas: Groupby Return Error After Pd Cuts"