Skip to content Skip to sidebar Skip to footer

Is Searchsorted Faster Than Get_loc To Find Label Location In A Dataframe Index?

I need to find the integer location for a label in a Pandas index. I know I can use get_loc method, but then I discovered searchsorted. Just wondering if I should use the latter fo

Solution 1:

It will depend on your usecase. using @ayhan's example.

With get_loc there is a big upfront cost of creating the hash table on the first lookup.

In [22]: idx = pd.Index(['R{0:07d}'.format(i) for i in range(10**7)])
In [23]: to_search = np.random.choice(idx, 10**5, replace=False)
In [24]: %time idx.get_loc(to_search[0])
Wall time: 1.57 s

But, subsequent lookups may be faster. (not guaranteed, depends on data)

In[9]: %%time
   ...: foriinto_search:
   ...:     idx.get_loc(i)
Walltime: 200msIn[10]: %%time
    ...: foriinto_search:
    ...:     np.searchsorted(idx, i)
Walltime: 486ms

Also, as Jeff noted, get_loc is guaranteed to always work, where searchsorted requires monotonicity (and doesn't check).

Post a Comment for "Is Searchsorted Faster Than Get_loc To Find Label Location In A Dataframe Index?"