python - Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column -
the title bit confusing i'll best explain problem here. have 2 pandas dataframes, , b:
>> print id | value 1 | 250 2 | 150 3 | 350 4 | 550 5 | 450 >> print b low | high | class 100 | 200 | 'a' 200 | 300 | 'b' 300 | 500 | 'a' 500 | 600 | 'c'
i want create new column called class in table contains class of value in accordance table b. here's result want:
>> print id | value | class 1 | 250 | 'b' 2 | 150 | 'a' 3 | 350 | 'a' 4 | 550 | 'c' 5 | 450 | 'a'
i have following code written sort of want:
a['class'] = pd.series() in range(len(a)): val = a['value'][i] cl = (b['class'][ (b['low'] <= val) \ (b['high'] >= val) ].iat[0]) a['class'].set_value(i,cl)
problem is, quick tables length of 10 or so, trying table size of 100,000+ both , b. there quicker way this, using function/attribute in pandas?
here's solution admittedly less elegant using series.searchsorted, runs super fast!
i pull data out pandas dataframes , convert them lists , use np.where populate variable called "aclass" conditions satified (in brute force loops). write "aclass" original data frame a.
the evaluation time 0.07489705 s, it's pretty fast, 200,000 data points!
# create 200,000 fake data points avalue = 100+600*np.random.random(200000) # assuming extracted avalue = np.array(a['value']) blow = [100,200,300,500] # assuming extracted b list(b['low']) bhigh = [200,300,500,600] # assuming extracted b list(b['high']) bclass = ['a','b','a','c'] # assuming extracted b list(b['class']) aclass = [[]]*len(avalue) # initialize aclass start_time = time.time() # timing execution in range(len(blow)): j in np.where((avalue>=blow[i]) & (avalue<=bhigh[i]))[0]: aclass[j]=bclass[i] # add class column original dataframe a['class'] = aclass print("--- %s seconds ---" % np.round(time.time() - start_time,decimals = 8))
Comments
Post a Comment