python - Multivariate Grouped Operation with pandas -
i'm trying make switch r's dplyr pandas in python. i've gone through several tutorials learn basics i'm stuck on 1 task. i'd use agg method in groupby perform operations on more 1 column. trivial task in r example below illustrates:
library(dplyr) df <- data.frame('id'=c(1, 1, 1, 2, 2, 2), 'a'=c(1, 2, 3, 4, 5, 6), 'b'=c(2, 4, 6, 8, 10, 12)) idgp <- group_by(df, id) %>% summarise(c = prod(b) / sum(a)) ### # output: ### ### > df ### id b ### 1 1 1 2 ### 2 1 2 4 ### 3 1 3 6 ### 4 2 4 8 ### 5 2 5 10 ### 6 2 6 12 ### ### > idgp ### # tibble: 2 x 2 ### id c ### <dbl> <dbl> ### 1 1 8 ### 2 2 64
in example, i'm grouping on id column in df , creating arbitrary new variable based on both columns , b. there straightforward way convert example python using pandas?
you can use groupby.apply
:
df = df.groupby('id').apply(lambda x: x['b'].prod() / x['a'].sum()).reset_index(name='c') print (df) id c 0 1 8.0 1 2 64.0
another solution groupby.prod
, groupby.sum
, divide series.div
:
g = df.groupby('id') df = g['b'].prod().div(g['a'].sum()).reset_index(name='c') print (df) id c 0 1 8.0 1 2 64.0
is same as:
df = df.groupby('id')['b'].prod().div(df.groupby('id')['a'].sum()).reset_index(name='c') print (df) id c 0 1 8.0 1 2 64.0
Comments
Post a Comment