python 2.7 - percentage bins based on predefined buckets -
i have series of numbers , know % of numbers falling in every bucket of dataframe.
df['cuts']
have 10, 20 , 50 values. specifically, % of series
in [0-10], (10-20] , (20-50]
bin , should appended df
dataframe.
i wrote following code. feel improvised. appreciated.
bin_cuts = [-1] + list(df['cuts'].values) out = pd.cut(series, bins = bin_cuts) df_pct_bins = pd.value_counts(out, normalize= true).reset_index() df_pct_bins = pd.concat([df_pct_bins['index'].str.split(', ', expand = true), df_pct_bins['cuts']], axis = 1) df_pct_bins[1] = df_pct_bins[1].str[:-1].astype(str) df['cuts'] = df['cuts'].astype(str) df_pct_bins = pd.merge(df, df_pct_bins, left_on= 'cuts', right_on= 1)
consider sample data df
, s
df = pd.dataframe(dict(cuts=[10, 20, 50])) s = pd.series(np.random.randint(50, size=1000))
option 1
np.searchsorted
c = df.cuts.values df.assign( pct=df.cuts.map( pd.value_counts( c[np.searchsorted(c, s)], normalize=true ))) cuts pct 0 10 0.216 1 20 0.206 2 50 0.578
option 2
pd.cut
c = df.cuts.values df.assign( pct=df.cuts.map( pd.cut( s, np.append(-np.inf, c), labels=c ).value_counts(normalize=true) )) cuts pct 0 10 0.216 1 20 0.206 2 50 0.578
Comments
Post a Comment