How To Count Longest Uninterrupted Sequence In Pandas
Solution 1:
Option 1
Use a the series itself to mask the cumulative sum of the negation. Then use value_counts
(~s).cumsum()[s].value_counts().max()
3explanation
(~s).cumsum()is a pretty standard way to produce distinctTrue/Falsegroups0111223242526374 dtype: int64But you can see that the group we care about is represented by the
2s and there are four of them. That's because the group is initiated by the firstFalse(which becomesTruewith(~s)). Therefore, we mask this cumulative sum with the boolean mask we started with.(~s).cumsum()[s]11324252 dtype: int64Now we see the three
2s pop out and we just have to use a method to extract them. I usedvalue_countsandmax.
Option 2
Use factorize and bincount
a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()
3
explanation
This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum() we didn't strictly need this part. I used it because it's a general purpose tool that could be used on arbitrary group names.
After pd.factorize I use those integer values in np.bincount which accumulates the total number of times each integer is used. Then take the maximum.
Option 3 As stated in the explanation of option 2, this also works:
a = s.values
np.bincount((~a).cumsum()[a]).max()
3Solution 2:
I think this could work
pd.Series(s.index[~s].values).diff().max()-1
Out[57]: 3.0Also outside pandas' we can back to python groupby
from itertools import groupby
max([len(list(group)) for key, group in groupby(s.tolist())])
Out[73]: 3Update :
from itertools import compress
max(list(compress([len(list(group)) forkey, groupin groupby(s.tolist())],[keyforkey, groupin groupby(s.tolist())])))
Out[84]: 3Solution 3:
You can use (inspired by @piRSquared answer):
s.groupby((~s).cumsum()).sum().max()
Out[513]: 3.0Another option to use a lambda func to do this.
s.to_frame().apply(lambda x: s.loc[x.name:].idxmin() - x.name, axis=1).max()
Out[429]: 3Solution 4:
Edit: As piRSquared mentioned, my previous solution needs to append two False at the beginning and at the end of the series. piRSquared kindly gave an answer based on that.
(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()
My original trial is
(np.diff(s.where(~s).dropna().index.values) - 1).max()
(This will not give the correct answer if the longest True starts at the beginning or ends at the end as pointed out by piRSquared. Please use the solution above given by piRSquared. This work remains only for explanation.)
Explanation:
This finds the indices of the False parts and by finding the gaps between the indices of False, we can know the longest True.
s.where(s == False).dropna().index.valuesfinds all the indices ofFalsearray([0, 2, 6, 7])
We know that Trues live between the Falses. Thus, we can use
np.diff to find the gaps between these indices.
array([2, 4, 1])
Minus 1 in the end as
Trues lies between these indices.Find the maximum of the difference.
Solution 5:
Your code was actually very close. It becomes perfect with a minor fix:
count=0maxCount=0for item in s:if item:count+=1ifcount>maxCount:maxCount=countelse:count=0print(maxCount)
Post a Comment for "How To Count Longest Uninterrupted Sequence In Pandas"