python - Multivariate Grouped Operation with pandas -

February 15, 2013

i'm trying make switch r's dplyr pandas in python. i've gone through several tutorials learn basics i'm stuck on 1 task. i'd use agg method in groupby perform operations on more 1 column. trivial task in r example below illustrates:

library(dplyr)  df <- data.frame('id'=c(1, 1, 1, 2, 2, 2),                  'a'=c(1, 2, 3, 4, 5, 6),                  'b'=c(2, 4, 6, 8, 10, 12))  idgp <- group_by(df, id) %>%   summarise(c = prod(b) / sum(a))   ### # output: ###  ### > df ###   id  b ### 1  1 1  2 ### 2  1 2  4 ### 3  1 3  6 ### 4  2 4  8 ### 5  2 5 10 ### 6  2 6 12 ### ### > idgp ### # tibble: 2 x 2 ###      id     c ###   <dbl> <dbl> ### 1     1     8 ### 2     2    64

in example, i'm grouping on id column in df , creating arbitrary new variable based on both columns , b. there straightforward way convert example python using pandas?

you can use groupby.apply:

df = df.groupby('id').apply(lambda x: x['b'].prod() / x['a'].sum()).reset_index(name='c') print (df)    id     c 0   1   8.0 1   2  64.0

another solution groupby.prod , groupby.sum, divide series.div:

g = df.groupby('id') df = g['b'].prod().div(g['a'].sum()).reset_index(name='c') print (df)    id     c 0   1   8.0 1   2  64.0

is same as:

df = df.groupby('id')['b'].prod().div(df.groupby('id')['a'].sum()).reset_index(name='c') print (df)    id     c 0   1   8.0 1   2  64.0

Search This Blog

Insert

python - Multivariate Grouped Operation with pandas -

Comments

Post a Comment

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -