python 2.7 - percentage bins based on predefined buckets -

February 15, 2013

i have series of numbers , know % of numbers falling in every bucket of dataframe.

df['cuts'] have 10, 20 , 50 values. specifically, % of series in [0-10], (10-20] , (20-50] bin , should appended df dataframe.

i wrote following code. feel improvised. appreciated.

bin_cuts = [-1] + list(df['cuts'].values) out = pd.cut(series, bins = bin_cuts) df_pct_bins = pd.value_counts(out, normalize= true).reset_index() df_pct_bins = pd.concat([df_pct_bins['index'].str.split(', ', expand = true), df_pct_bins['cuts']], axis = 1) df_pct_bins[1] = df_pct_bins[1].str[:-1].astype(str) df['cuts'] = df['cuts'].astype(str) df_pct_bins = pd.merge(df, df_pct_bins, left_on= 'cuts', right_on= 1)

consider sample data df , s

df = pd.dataframe(dict(cuts=[10, 20, 50])) s = pd.series(np.random.randint(50, size=1000))

option 1
np.searchsorted

c = df.cuts.values df.assign(     pct=df.cuts.map(         pd.value_counts(             c[np.searchsorted(c, s)],             normalize=true         )))     cuts    pct 0    10  0.216 1    20  0.206 2    50  0.578

option 2
pd.cut

c = df.cuts.values df.assign(     pct=df.cuts.map(         pd.cut(             s,             np.append(-np.inf, c),             labels=c         ).value_counts(normalize=true)     ))     cuts    pct 0    10  0.216 1    20  0.206 2    50  0.578

Search This Blog

Insert

python 2.7 - percentage bins based on predefined buckets -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -