Skip to content Skip to sidebar Skip to footer

Pipeline Imputer Error "input Contains Nan"

I am trying to create a pipeline to help me process some data by: Imputing the mean, scaling the data, and then fitting a regressor. I am having some trouble with the Imputer, and

Solution 1:

Try to remove the line PtagPrSKU.

So after the column names you should just have their values. The easy way to do this is using pandas and defining skiprows when loading the data.

The following works fine for me.

The problem

The PtagPrSKU line inserts an empty cell for each column (this is the problem).

The file that I used for this example can be found here link

from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
import pandas as pd



defbuildit(df):
    imp = Imputer()  
    scl = StandardScaler()  
    clf = RandomForestRegressor()      
    pipeline = Pipeline([('imputer', imp), ('scaler', scl),  ('clf', clf)])
    clf_x = pipeline.fit_transform(df[['OverallHeight-ToptoBottom', 'OverallDepth-FronttoBack']], df['OverallWidth-SidetoSide'])

    return clf_x



df = pd.read_excel('t.xlsx',skiprows=[1])
print(df)
buildit(df)

Solution 2:

Change your missing value identifier from 'np.nan' to something else (maybe 0 or a very big number). I had the same issue and this worked for me.

Post a Comment for "Pipeline Imputer Error "input Contains Nan""