How To Do Row Processing And Item Assignment In Dask
Similar unanswered question: Row by row processing of a Dask DataFrame I'm working with dataframes that are millions on rows long, and so now I'm trying to have all dataframe opera
Solution 1:
Dask dataframe does not support efficient iteration or row assignment. In general these workflows rarely scale well. They are also quite slow in Pandas itself.
Instead, you might consider using the Series.where method. Here is a minimal example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})
In [3]: import dask.dataframe as dd
In [4]: ddf = dd.from_pandas(df, npartitions=2)
In [5]: ddf['z'] = ddf.x.where(ddf.x > ddf.y, ddf.y)
In [6]: ddf.compute()
Out[6]:
x y z
013312222313
Post a Comment for "How To Do Row Processing And Item Assignment In Dask"