Skip to content Skip to sidebar Skip to footer

How To Do Row Processing And Item Assignment In Dask

Similar unanswered question: Row by row processing of a Dask DataFrame I'm working with dataframes that are millions on rows long, and so now I'm trying to have all dataframe opera

Solution 1:

Dask dataframe does not support efficient iteration or row assignment. In general these workflows rarely scale well. They are also quite slow in Pandas itself.

Instead, you might consider using the Series.where method. Here is a minimal example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})

In [3]: import dask.dataframe as dd

In [4]: ddf = dd.from_pandas(df, npartitions=2)

In [5]: ddf['z'] = ddf.x.where(ddf.x > ddf.y, ddf.y)

In [6]: ddf.compute()
Out[6]:
   x  y  z
013312222313

Post a Comment for "How To Do Row Processing And Item Assignment In Dask"