mysql - Python Pandas - Using to_sql to write large data frames in chunks -

January 15, 2015

i'm using pandas' to_sql function write mysql, timing out due large frame size (1m rows, 20 columns).

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.to_sql.html

is there more official way chunk through data , write rows in blocks? i've written own code, seems work. i'd prefer official solution though. thanks!

def write_to_db(engine, frame, table_name, chunk_size):      start_index = 0     end_index = chunk_size if chunk_size < len(frame) else len(frame)      frame = frame.where(pd.notnull(frame), none)     if_exists_param = 'replace'      while start_index != end_index:         print "writing rows %s through %s" % (start_index, end_index)         frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)         if_exists_param = 'append'          start_index = min(start_index + chunk_size, len(frame))         end_index = min(end_index + chunk_size, len(frame))  engine = sqlalchemy.create_engine('mysql://...') #database details omited write_to_db(engine, frame, 'retail_pendingcustomers', 20000)

update: functionality has been merged in pandas master , released in 0.15 (probably end of september), @artemyk! see https://github.com/pydata/pandas/pull/8062

so starting 0.15, can specify chunksize argument , e.g. do:

df.to_sql('table', engine, chunksize=20000)

Search This Blog

Insert

mysql - Python Pandas - Using to_sql to write large data frames in chunks -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -