How To Count Longest Uninterrupted Sequence In Pandas
Solution 1:
Option 1
Use a the series itself to mask the cumulative sum of the negation. Then use value_counts
(~s).cumsum()[s].value_counts().max()
3
explanation
(~s).cumsum()
is a pretty standard way to produce distinctTrue
/False
groups0111223242526374 dtype: int64
But you can see that the group we care about is represented by the
2
s and there are four of them. That's because the group is initiated by the firstFalse
(which becomesTrue
with(~s)
). Therefore, we mask this cumulative sum with the boolean mask we started with.(~s).cumsum()[s]11324252 dtype: int64
Now we see the three
2
s pop out and we just have to use a method to extract them. I usedvalue_counts
andmax
.
Option 2
Use factorize
and bincount
a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()
3
explanation
This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize
to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum()
we didn't strictly need this part. I used it because it's a general purpose tool that could be used on arbitrary group names.
After pd.factorize
I use those integer values in np.bincount
which accumulates the total number of times each integer is used. Then take the maximum.
Option 3 As stated in the explanation of option 2, this also works:
a = s.values
np.bincount((~a).cumsum()[a]).max()
3
Solution 2:
I think this could work
pd.Series(s.index[~s].values).diff().max()-1
Out[57]: 3.0
Also outside pandas' we can back to python groupby
from itertools import groupby
max([len(list(group)) for key, group in groupby(s.tolist())])
Out[73]: 3
Update :
from itertools import compress
max(list(compress([len(list(group)) forkey, groupin groupby(s.tolist())],[keyforkey, groupin groupby(s.tolist())])))
Out[84]: 3
Solution 3:
You can use (inspired by @piRSquared answer):
s.groupby((~s).cumsum()).sum().max()
Out[513]: 3.0
Another option to use a lambda func to do this.
s.to_frame().apply(lambda x: s.loc[x.name:].idxmin() - x.name, axis=1).max()
Out[429]: 3
Solution 4:
Edit: As piRSquared mentioned, my previous solution needs to append two False
at the beginning and at the end of the series. piRSquared kindly gave an answer based on that.
(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()
My original trial is
(np.diff(s.where(~s).dropna().index.values) - 1).max()
(This will not give the correct answer if the longest True
starts at the beginning or ends at the end as pointed out by piRSquared. Please use the solution above given by piRSquared. This work remains only for explanation.)
Explanation:
This finds the indices of the False
parts and by finding the gaps between the indices of False
, we can know the longest True
.
s.where(s == False).dropna().index.values
finds all the indices ofFalse
array([0, 2, 6, 7])
We know that True
s live between the False
s. Thus, we can use
np.diff
to find the gaps between these indices.
array([2, 4, 1])
Minus 1 in the end as
True
s lies between these indices.Find the maximum of the difference.
Solution 5:
Your code was actually very close. It becomes perfect with a minor fix:
count=0maxCount=0for item in s:if item:count+=1ifcount>maxCount:maxCount=countelse:count=0print(maxCount)
Post a Comment for "How To Count Longest Uninterrupted Sequence In Pandas"