python - Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column -

May 15, 2014

the title bit confusing i'll best explain problem here. have 2 pandas dataframes, , b:

>> print  id | value  1 | 250  2 | 150  3 | 350  4 | 550  5 | 450  >> print b  low | high | class 100 | 200  | 'a'  200 | 300  | 'b'  300 | 500  | 'a'  500 | 600  | 'c'

i want create new column called class in table contains class of value in accordance table b. here's result want:

>> print  id | value | class  1 | 250   | 'b'  2 | 150   | 'a'  3 | 350   | 'a'  4 | 550   | 'c'  5 | 450   | 'a'

i have following code written sort of want:

a['class'] = pd.series() in range(len(a)):     val = a['value'][i]     cl = (b['class'][ (b['low'] <= val) \                       (b['high'] >= val) ].iat[0])     a['class'].set_value(i,cl)

problem is, quick tables length of 10 or so, trying table size of 100,000+ both , b. there quicker way this, using function/attribute in pandas?

here's solution admittedly less elegant using series.searchsorted, runs super fast!

i pull data out pandas dataframes , convert them lists , use np.where populate variable called "aclass" conditions satified (in brute force loops). write "aclass" original data frame a.

the evaluation time 0.07489705 s, it's pretty fast, 200,000 data points!

# create 200,000 fake data points avalue = 100+600*np.random.random(200000) # assuming extracted avalue = np.array(a['value'])  blow = [100,200,300,500] # assuming extracted b list(b['low']) bhigh = [200,300,500,600] # assuming extracted b list(b['high']) bclass = ['a','b','a','c'] # assuming extracted b list(b['class'])  aclass = [[]]*len(avalue) # initialize aclass  start_time = time.time() # timing execution in range(len(blow)):     j in np.where((avalue>=blow[i]) & (avalue<=bhigh[i]))[0]:         aclass[j]=bclass[i]  # add class column original dataframe a['class'] = aclass  print("--- %s seconds ---" % np.round(time.time() - start_time,decimals = 8))

Search This Blog

Insert

python - Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -