What's The Inverse Of The Quantile Function On A Pandas Series?
The quantile functions gives us the quantile of a given pandas series s, E.g. s.quantile(0.9) is 4.2 Is there the inverse function (i.e. cumulative distribution) which finds the
Solution 1:
I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy.
#libs requiredfrom scipy import stats
import pandas as pd
import numpy as np
#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])
#quantile function
x = df.quantile(0.5)[0]
#inverse of quantile
stats.percentileofscore(df['a'],x)
Solution 2:
Sorting can be expensive, if you look for a single value I'd guess you'd be better of computing it with:
s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish
There's probably a way to avoid the int(bool) shenanigan.
Solution 3:
Mathematically speaking, you're trying to find the CDF or return the probability of s
being smaller than or equal to a value or quantile of q
:
F(q)= Pr[s <= q]
One can use numpy and try this one-line code:
np.mean(s.to_numpy() <= q)
Solution 4:
There's no 1-liner that I know of, but you can achieve this with scipy:
import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])
# a is the value, b is the percentile>>> sdf
index a b
0100.0304690.0130.1444450.1240.3047630.2310.3595890.3470.3855240.4550.5389590.5680.6428450.6760.6677100.7890.7335040.8920.9056460.91000.9619361.0
Now we can see that the two functions are inverses of each other.
>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)
interp can also take in list, a numpy array, or a pandas data series, any iterator really!
Solution 5:
Just came across the same problem. Here's my two cents.
definverse_percentile(arr, num):
arr = sorted(arr)
i_arr = [i for i, x inenumerate(arr) if x > num]
return i_arr[0] / len(arr) iflen(i_arr) > 0else1
Post a Comment for "What's The Inverse Of The Quantile Function On A Pandas Series?"