Skip to content Skip to sidebar Skip to footer

Replace Values In Pandas Column When N Number Of NaNs Exist In Another Column

I have the foll. pandas dataframe: 2018-05-25 0.000381 0.264318 land 2018-05-25 2018-05-26 0.000000 0.264447 land 2018-05-26 2018-05-27 0.000000 0.264791 Na

Solution 1:

Here's an approach where the consecutive appearance of null is n i.e

n = 3
# create a mask
x = df[3].isnull()
# counter to restart the count of nan once there is a no nan consecutively 
se = (x.cumsum() - x.cumsum().where(~x).fillna(method='pad').fillna(0))


df.loc[se>=n,2] = np.nan

       0         1         2     3           4
0   2018-05-25  0.000381  0.264318  land  2018-05-25
1   2018-05-26  0.000000  0.264447  land  2018-05-26
2   2018-05-27  0.000000  0.264791   NaN         NaT
3   2018-05-28  0.000000  0.265253   NaN         NaT
4   2018-05-29  0.000000       NaN   NaN         NaT
5   2018-05-30  0.000000  0.266066  land  2018-05-30
6   2018-05-31  0.000000  0.266150   NaN         NaT
7   2018-06-01  0.000000  0.265816   NaN         NaT
8   2018-06-02  0.000000  0.264892  land  2018-06-02
9   2018-06-03  0.000000  0.263191   NaN         NaT
10  2018-06-04  0.000000  0.260508  land  2018-06-04
11  2018-06-05  0.000000  0.256619   NaN         NaT
12  2018-06-06  0.000000  0.251286   NaN         NaT
13  2018-06-07  0.000000       NaN   NaN         NaT
14  2018-06-08  0.000000       NaN   NaN         NaT
15  2018-06-09  0.000000  0.223932  land  2018-06-09

Solution 2:

Edit, more versatile approach for any threshold of consecutive NaN's:

threshold = 3
mask = df.d.notna()
df.loc[(~mask).groupby(mask.cumsum()).transform('cumsum') >= threshold, 'c'] = np.nan

You can simply check if the row, as well as shifting the row twice are all null (I named your columns a-e:

df.loc[df.d.isnull() & df.d.shift().isnull() & df.d.shift(2).isnull(), 'c'] = np.nan

# Result:

             a         b         c     d           e
0   2018-05-25  0.000381  0.264318  land  2018-05-25
1   2018-05-26  0.000000  0.264447  land  2018-05-26
2   2018-05-27  0.000000  0.264791   NaN         NaT
3   2018-05-28  0.000000  0.265253   NaN         NaT
4   2018-05-29  0.000000       NaN   NaN         NaT
5   2018-05-30  0.000000  0.266066  land  2018-05-30
6   2018-05-31  0.000000  0.266150   NaN         NaT
7   2018-06-01  0.000000  0.265816   NaN         NaT
8   2018-06-02  0.000000  0.264892  land  2018-06-02
9   2018-06-03  0.000000  0.263191   NaN         NaT
10  2018-06-04  0.000000  0.260508  land  2018-06-04
11  2018-06-05  0.000000  0.256619   NaN         NaT
12  2018-06-06  0.000000  0.251286   NaN         NaT
13  2018-06-07  0.000000       NaN   NaN         NaT
14  2018-06-08  0.000000       NaN   NaN         NaT
15  2018-06-09  0.000000  0.223932  land  2018-06-09

Post a Comment for "Replace Values In Pandas Column When N Number Of NaNs Exist In Another Column"