mysql - Python Pandas - Using to_sql to write large data frames in chunks -
i'm using pandas' to_sql
function write mysql, timing out due large frame size (1m rows, 20 columns).
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.to_sql.html
is there more official way chunk through data , write rows in blocks? i've written own code, seems work. i'd prefer official solution though. thanks!
def write_to_db(engine, frame, table_name, chunk_size): start_index = 0 end_index = chunk_size if chunk_size < len(frame) else len(frame) frame = frame.where(pd.notnull(frame), none) if_exists_param = 'replace' while start_index != end_index: print "writing rows %s through %s" % (start_index, end_index) frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param) if_exists_param = 'append' start_index = min(start_index + chunk_size, len(frame)) end_index = min(end_index + chunk_size, len(frame)) engine = sqlalchemy.create_engine('mysql://...') #database details omited write_to_db(engine, frame, 'retail_pendingcustomers', 20000)
update: functionality has been merged in pandas master , released in 0.15 (probably end of september), @artemyk! see https://github.com/pydata/pandas/pull/8062
so starting 0.15, can specify chunksize
argument , e.g. do:
df.to_sql('table', engine, chunksize=20000)
Comments
Post a Comment