Skip to content Skip to sidebar Skip to footer

After Groupby And Sum,how To Get The Max Value Rows In `pandas.dataframe`?

here the df(i updated by real data ): >TIMESTAMP OLTPSOURCE RNR RQDRECORD >20150425232836 0PU_IS_PS_44 REQU_51NHAJUV06IMMP16BVE57

Solution 1:

You can take the groupby result, call max on this and pass param level=0 or level='clsa' if you prefer, this will return you the max count for that level. However this loses the 'clsb' column so what you can do is merge this back to your grouped result after calling reset_index on the grouped object, you can reorder the resulting df columns by using fancy indexing:

In [149]:
gp = df.groupby(['clsa','clsb']).sum()
result = gp.max(level=0).reset_index().merge(gp.reset_index())
result = result.ix[:,['clsa','clsb','count']]
result

Out[149]:
  clsa clsb  count
0    a   a1      9
1    b   b2      8
2    c   c2     10

Solution 2:

df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%Y%m%d%H%M%S')
df_gb = df.groupby(['OLTPSOURCE', 'RNR'], as_index=False).aggregate(sum)
final = pd.merge(df[['TIMESTAMP', 'OLTPSOURCE', 'RNR']], df_gb.groupby(['OLTPSOURCE'], as_index=False).first(), on=['OLTPSOURCE', 'RNR'], how='right').sort('OLTPSOURCE')
final.plot(kind='bar')
plt.show()


print final

             TIMESTAMP            OLTPSOURCE                             RNR  \
3  2015-06-23 21:52:02          0CO_OM_CCA_1  REQU_528XSXYWTK6FSJXDQY2ROQQ4Q   
2  2015-01-07 20:13:58       0EQUIPMENT_ATTR  REQU_50EVHXSDOITYUQLP4L8UXOBT6   
5  2015-06-26 18:55:31            0FI_AA_001  REQU_52BO3RJCOG4JGHEIIZMJP9V4A   
11 2015-04-17 18:49:16            0FI_AA_004  REQU_51KFWWT6PPTI5X44D3MWD7CYU   
6  2015-03-07 22:23:36       0FUNCT_LOC_ATTR  REQU_513JJ6I6ER5ZVW5CAJMVSKAJQ   
4  2015-07-15 14:41:39      0HRPOSITION_TEXT  REQU_52I9KQ1LN4ZWTNIP0N1R68NDY   
10 2015-01-02 16:21:40              0HR_PA_0  REQU_50CNUT7I9OXH2WSNLC4WTUZ7U   
13 2015-04-19 23:07:24          0PU_IS_PS_44  REQU_51LC5XX6VWEERAVHEFJ9K5A6I   
7  2015-06-30 16:34:19       0WBS_ELEMT_ATTR  REQU_52CUPVUFCY2DDOG6SPQ1XOYQ2   
8  2015-04-24 16:22:26  6DB_V_DGP_EXPORTDATA  REQU_51N1F5ZC8G3LW68E4TFXRGH9I   
0  2015-01-28 16:57:26              ZFI_DS41  REQU_50P1AABLYXE86KYE3O6EY390M   
12 2015-02-05 15:06:33              ZHR_DS09  REQU_50RFRYRADMA9QXB1PW4PRF5XM   
9  2015-06-17 14:37:20            ZRZMS_TEXT  REQU_5268R1YE6G1U7HUK971LX1FPM   
1  2015-07-01 14:42:53            ZZZJB_TEXT  REQU_52DV5FB812JCDXDVIV9P35DGM   

    RQDRECORD  
3           0  
2       14205  
5           0  
11          0  
6       13889  
4       25381  
10          0  
13      22528  
7           0  
8           0  
0        6925  
12       6667  
9           6  
1           2 

Post a Comment for "After Groupby And Sum,how To Get The Max Value Rows In `pandas.dataframe`?"