python - What's the idiomatic way to perform an aggregate and rename operation in pandas -


for example, how do following r data.table operation in pandas:

paths[,.( completed=sum(exists), missing=sum(not(exists)), total=.n, 'size (g)'=sum(sizemb)/1024), by=.(projectpath, pipelineid)] 

i.e. group projectpath , pipelineid, aggregate of columns using possibly custom functions, , rename resulting columns.

output should dataframe no hierarchical indexes, example:

                      projectpath pipelineid completed missing size (g) /data/pnl/projects/tracts/pnlpipe          0      2568       0 45.30824 /data/pnl/projects/tracts/pnlpipe          1      1299       0 62.69934 

you can use groupby.agg:

df.groupby(['projectpath', 'pipelineid']).agg({         'exists': {'completed': 'sum', 'missing': lambda x: (~x).sum(), 'total': 'size'},         'sizemb': {'size (g)': lambda x: x.sum()/1024}     }) 

sample run:

df = pd.dataframe({         'projectpath': [1,1,1,1,2,2,2,2],         'pipelineid': [1,1,2,2,1,1,2,2],         'exists': [true, false,true,true,false,false,true,false],         'sizemb': [120032,12234,223311,3223,11223,33445,3444,23321]     })  df1 = df.groupby(['projectpath', 'pipelineid']).agg({         'exists': {'completed': 'sum', 'missing': lambda x: (~x).sum(), 'total': 'size'},         'sizemb': {'size (g)': lambda x: x.sum()/1024}     }) ​ df1.columns = df1.columns.droplevel(0) ​ df1.reset_index() 

enter image description here


update: if want customize aggregation without using deprecated nested dictionary syntax, can use groupby.apply , return series object each group:

df.groupby(['projectpath', 'pipelineid']).apply(     lambda g: pd.series({             'completed': g.exists.sum(),             'missing': (~g.exists).sum(),             'total': g.exists.size,             'size (g)': g.sizemb.sum()/1024          }) ).reset_index() 

enter image description here


Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

Optimising Firebase database by automatically overwriting data -

javascript - Angular UI-Grid customTemplate directive causing rows to load slowly/? -