Skip to content Skip to sidebar Skip to footer

How To Count Longest Uninterrupted Sequence In Pandas

Let's say I have pd.Series like below s = pd.Series([False, True, False,True,True,True,False, False]) 0 False 1 True 2 False 3 True 4 True 5 True 6 Fa

Solution 1:

Option 1 Use a the series itself to mask the cumulative sum of the negation. Then use value_counts

(~s).cumsum()[s].value_counts().max()

3

explanation

  1. (~s).cumsum() is a pretty standard way to produce distinct True/False groups

    0111223242526374
    dtype: int64
  2. But you can see that the group we care about is represented by the 2s and there are four of them. That's because the group is initiated by the first False (which becomes True with (~s)). Therefore, we mask this cumulative sum with the boolean mask we started with.

    (~s).cumsum()[s]11324252
    dtype: int64
    
  3. Now we see the three 2s pop out and we just have to use a method to extract them. I used value_counts and max.


Option 2 Use factorize and bincount

a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()

3

explanation This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum() we didn't strictly need this part. I used it because it's a general purpose tool that could be used on arbitrary group names.

After pd.factorize I use those integer values in np.bincount which accumulates the total number of times each integer is used. Then take the maximum.


Option 3 As stated in the explanation of option 2, this also works:

a = s.values
np.bincount((~a).cumsum()[a]).max()

3

Solution 2:

I think this could work

pd.Series(s.index[~s].values).diff().max()-1
Out[57]: 3.0

Also outside pandas' we can back to python groupby

from itertools import groupby
max([len(list(group)) for key, group in groupby(s.tolist())])
Out[73]: 3

Update :

from itertools import compress
max(list(compress([len(list(group)) forkey, groupin groupby(s.tolist())],[keyforkey, groupin groupby(s.tolist())])))
Out[84]: 3

Solution 3:

You can use (inspired by @piRSquared answer):

s.groupby((~s).cumsum()).sum().max()
Out[513]: 3.0

Another option to use a lambda func to do this.

s.to_frame().apply(lambda x: s.loc[x.name:].idxmin() - x.name, axis=1).max()
Out[429]: 3

Solution 4:

Edit: As piRSquared mentioned, my previous solution needs to append two False at the beginning and at the end of the series. piRSquared kindly gave an answer based on that.

(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()

My original trial is

(np.diff(s.where(~s).dropna().index.values) - 1).max()

(This will not give the correct answer if the longest True starts at the beginning or ends at the end as pointed out by piRSquared. Please use the solution above given by piRSquared. This work remains only for explanation.)

Explanation:

This finds the indices of the False parts and by finding the gaps between the indices of False, we can know the longest True.

  • s.where(s == False).dropna().index.values finds all the indices of False

    array([0, 2, 6, 7])
    

We know that Trues live between the Falses. Thus, we can use np.diff to find the gaps between these indices.

array([2, 4, 1])
  • Minus 1 in the end as Trues lies between these indices.

  • Find the maximum of the difference.

Solution 5:

Your code was actually very close. It becomes perfect with a minor fix:

count=0maxCount=0for item in s:if item:count+=1ifcount>maxCount:maxCount=countelse:count=0print(maxCount)

Post a Comment for "How To Count Longest Uninterrupted Sequence In Pandas"