python - Dask: DataFrame taking forever to compute -
i created dask dataframe pandas dataframe ~50k rows , 5 columns:
ddf = dd.from_pandas(df, npartitions=32)
i add bunch of columns (~30) dataframe , try turn pandas dataframe:
data = ddf.compute(get = dask.multiprocessing.get)
i looked @ docs , if don't specify num_workers
, defaults using cores. i'm on 64 core ec2 instance , above line has taken minutes without finishing...
any idea how speed or i'm doing incorrectly?
thanks!
Comments
Post a Comment