python - How to use nlargest on multilevel pivot_table in pandas? -
i'm summing values in pivot table using pandas.
dfr = pd.dataframe({'a': [1,1,1,1,2,2,2,2], 'b': [1,2,2,3,1,2,2,2], 'c': [1,1,1,2,1,1,2,2], 'val':[1,1,1,1,1,1,1,1]}) dfr = dfr.pivot_table(values='val', index=['a', 'b', 'c'], aggfunc=np.sum) dfr
output:
a b c |val ------------|--- 1 1 1 |1 2 1 |2 3 2 |1 2 1 1 |1 2 1 |1 2 |2
the way need output show largest in each group a, this:
a b c |val ------------|--- 1 2 1 |2 2 2 2 |2
i've googled bit around , tried using nlargest() in different ways without being able produce result want. got ideas?
i think need groupby
+ nlargest
level a
:
dfr = dfr.pivot_table(values='val', index=['a', 'b', 'c'], aggfunc=np.sum) dfr = dfr.groupby(level='a')['val'].nlargest(1).reset_index(level=0, drop=true).reset_index() print (dfr) b c val 0 1 2 1 2 1 2 2 2 2
because if use pivot_table
levels lost:
dfr = dfr.pivot_table(values='val', index=['a', 'b', 'c'], aggfunc=np.sum).reset_index() dfr = dfr.pivot_table(values='val', index='a', aggfunc=lambda x: x.nlargest(1)) print (dfr) val 1 2 2 2
and if use levels return nlrgest levels (not want)
dfr = dfr.pivot_table(values='val', index=['a', 'b', 'c'], aggfunc=np.sum).reset_index() dfr = dfr.pivot_table(values='val', index=['a', 'b', 'c'], aggfunc=lambda x: x.nlargest(1)) print (dfr) val b c 1 1 1 1 2 1 2 3 2 1 2 1 1 1 2 1 1 2 2
Comments
Post a Comment