python - What's the idiomatic way to perform an aggregate and rename operation in pandas -

January 15, 2014

for example, how do following r data.table operation in pandas:

paths[,.( completed=sum(exists), missing=sum(not(exists)), total=.n, 'size (g)'=sum(sizemb)/1024), by=.(projectpath, pipelineid)]

i.e. group projectpath , pipelineid, aggregate of columns using possibly custom functions, , rename resulting columns.

output should dataframe no hierarchical indexes, example:

                      projectpath pipelineid completed missing size (g) /data/pnl/projects/tracts/pnlpipe          0      2568       0 45.30824 /data/pnl/projects/tracts/pnlpipe          1      1299       0 62.69934

you can use groupby.agg:

df.groupby(['projectpath', 'pipelineid']).agg({         'exists': {'completed': 'sum', 'missing': lambda x: (~x).sum(), 'total': 'size'},         'sizemb': {'size (g)': lambda x: x.sum()/1024}     })

sample run:

df = pd.dataframe({         'projectpath': [1,1,1,1,2,2,2,2],         'pipelineid': [1,1,2,2,1,1,2,2],         'exists': [true, false,true,true,false,false,true,false],         'sizemb': [120032,12234,223311,3223,11223,33445,3444,23321]     })  df1 = df.groupby(['projectpath', 'pipelineid']).agg({         'exists': {'completed': 'sum', 'missing': lambda x: (~x).sum(), 'total': 'size'},         'sizemb': {'size (g)': lambda x: x.sum()/1024}     })  df1.columns = df1.columns.droplevel(0)  df1.reset_index()

update: if want customize aggregation without using deprecated nested dictionary syntax, can use groupby.apply , return series object each group:

df.groupby(['projectpath', 'pipelineid']).apply(     lambda g: pd.series({             'completed': g.exists.sum(),             'missing': (~g.exists).sum(),             'total': g.exists.size,             'size (g)': g.sizemb.sum()/1024          }) ).reset_index()

Search This Blog

Insert

python - What's the idiomatic way to perform an aggregate and rename operation in pandas -

Comments

Post a Comment

Popular posts from this blog

service - Android MediaPlayer calls onCompletion before it already finished -

javascript - Training Neural Network to play flappy bird with genetic algorithm - Why can't it learn? -

javascript - Create a stacked percentage column -