python - How to filter grouped by in Pandas -


i newbie in pandas. have following dataset. think of dataset departments('k1') , people('k2') of company.

dframe = pd.dataframe({'k1': ['x','x','y','y','z','z'],    ...:                         'k2': ['p1','p2','p3','p4','p5','p6'],    ...:                          'dataset1': np.random.randn(6)})    ...: 

if take mean grouped dept/'k1' following

   dataset1 k1 x   0.153825 y  -0.648500 z   1.133334 

if take mean grouped people/'k2', following

in [6]: dframe.groupby('k2').mean() out[6]:     dataset1 k2 p1  1.595455 p2 -1.287805 p3  0.211858 p4 -1.508859 p5  1.350336 p6  0.916332 

my question how can filter mean values grouped people/'k2' greater mean of dept/'k1' belongs. e.g. p1 mean value greater x mean value belongs

out[6]:     dataset1 k2 p1  1.595455 p3  0.211858 p5  1.350336 

sample (changed p6 p5):

np.random.seed(45) dframe = pd.dataframe({'k1': ['x','x','y','y','z','z'],                          'k2': ['p1','p2','p3','p4','p5','p5'],                             'dataset1': np.random.randn(6)})  print (dframe)    dataset1 k1  k2 0  0.026375  x  p1 1  0.260322  x  p2 2 -0.395146  y  p3 3 -0.204301  y  p4 4 -1.271633  z  p5 5 -2.596879  z  p5 

first create new column groupby , transform:

dframe['meank1'] = dframe.groupby('k1').transform('mean') print (dframe)    dataset1 k1  k2    meank1 0  0.026375  x  p1  0.143348 1  0.260322  x  p2  0.143348 2 -0.395146  y  p3 -0.299723 3 -0.204301  y  p4 -0.299723 4 -1.271633  z  p5 -1.934256 5 -2.596879  z  p5 -1.934256 

then aggregate agg mean , first, necessary add k1 column groupby avoid wrong output if same k2 in k1.

dframe = dframe.groupby(['k1','k2']).agg({'dataset1':'mean', 'meank1':'first'}) print (dframe)          meank1  dataset1 k1 k2                     x  p1  0.143348  0.026375    p2  0.143348  0.260322 y  p3 -0.299723 -0.395146    p4 -0.299723 -0.204301 z  p5 -1.934256 -1.934256 

last filter boolean indexing or query:

dframe = dframe.loc[dframe['meank1'] > dframe['dataset1'], ['dataset1']] #alternative sol #dframe = dframe.query('meank1 > dataset1')[['dataset1']] print (dframe)        dataset1 k1 k2           x  p1  0.026375 y  p3 -0.395146 

and if want remove first level of multiindex add reset_index:

dframe = dframe.reset_index(level=0, drop=true) print (dframe)     dataset1 k2           p1  0.026375 p3 -0.395146 

for column index use:

dframe = dframe.reset_index(level=0, drop=true).reset_index() print (dframe)    k2  dataset1 0  p1  0.026375 1  p3 -0.395146 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -