Dask: Is It Safe To Pickle A Dataframe For Later Use?
I have a database-like object containing many dask dataframes. I would like to work with the data, save it and reload it on the next day to continue the analysis. Therefore, I tri
Solution 1:
Generally speaking it is usually safe. However there are a few caveats:
- If your dask.dataframe contains custom functions, such as with with
df.apply(lambda x: x)
then the internal function will not be pickleable. However it will still be serializable with cloudpickle - If your dask.dataframe contains references to files that are only valid on your local computer then, while it will still be serializable the re-serialized version on another machine may no longer be useful
- If your dask.dataframe contains
dask.distributed
Future
objects, such as would occur if you useExecutor.persist
on a cluster then these are not currently serializable. - I recommend using a version >= 0.11.0.
Post a Comment for "Dask: Is It Safe To Pickle A Dataframe For Later Use?"