Is Searchsorted Faster Than Get_loc To Find Label Location In A Dataframe Index?
I need to find the integer location for a label in a Pandas index. I know I can use get_loc method, but then I discovered searchsorted. Just wondering if I should use the latter fo
Solution 1:
It will depend on your usecase. using @ayhan's example.
With get_loc
there is a big upfront cost of creating the hash table on the first lookup.
In [22]: idx = pd.Index(['R{0:07d}'.format(i) for i in range(10**7)])
In [23]: to_search = np.random.choice(idx, 10**5, replace=False)
In [24]: %time idx.get_loc(to_search[0])
Wall time: 1.57 s
But, subsequent lookups may be faster. (not guaranteed, depends on data)
In[9]: %%time
...: foriinto_search:
...: idx.get_loc(i)
Walltime: 200msIn[10]: %%time
...: foriinto_search:
...: np.searchsorted(idx, i)
Walltime: 486ms
Also, as Jeff noted, get_loc
is guaranteed to always work, where searchsorted
requires monotonicity (and doesn't check).
Post a Comment for "Is Searchsorted Faster Than Get_loc To Find Label Location In A Dataframe Index?"