What's The Inverse Of The Quantile Function On A Pandas Series?

September 08, 2024 Post a Comment

The quantile functions gives us the quantile of a given pandas series s, E.g. s.quantile(0.9) is 4.2 Is there the inverse function (i.e. cumulative distribution) which finds the

Solution 1:

I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy.

#libs requiredfrom scipy import stats
import pandas as pd
import numpy as np

#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])

#quantile function
x = df.quantile(0.5)[0]

#inverse of quantile
stats.percentileofscore(df['a'],x)

Solution 2:

Sorting can be expensive, if you look for a single value I'd guess you'd be better of computing it with:

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

There's probably a way to avoid the int(bool) shenanigan.

Solution 3:

Mathematically speaking, you're trying to find the CDF or return the probability of s being smaller than or equal to a value or quantile of q:

F(q)= Pr[s <= q]

One can use numpy and try this one-line code:

np.mean(s.to_numpy() <= q)

Solution 4:

There's no 1-liner that I know of, but you can achieve this with scipy:

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile>>> sdf
    index         a    b
0100.0304690.0130.1444450.1240.3047630.2310.3595890.3470.3855240.4550.5389590.5680.6428450.6760.6677100.7890.7335040.8920.9056460.91000.9619361.0

Now we can see that the two functions are inverses of each other.

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

interp can also take in list, a numpy array, or a pandas data series, any iterator really!

Solution 5:

Just came across the same problem. Here's my two cents.

definverse_percentile(arr, num):
    arr = sorted(arr)
    i_arr = [i for i, x inenumerate(arr) if x > num]

    return i_arr[0] / len(arr) iflen(i_arr) > 0else1

Learn Python Programming