Pipeline Imputer Error "input Contains Nan"
I am trying to create a pipeline to help me process some data by: Imputing the mean, scaling the data, and then fitting a regressor. I am having some trouble with the Imputer, and
Solution 1:
Try to remove the line PtagPrSKU.
So after the column names you should just have their values. The easy way to do this is using pandas and defining skiprows when loading the data.
The following works fine for me.
The problem
The PtagPrSKU line inserts an empty cell for each column (this is the problem).
The file that I used for this example can be found here link
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
import pandas as pd
defbuildit(df):
imp = Imputer()
scl = StandardScaler()
clf = RandomForestRegressor()
pipeline = Pipeline([('imputer', imp), ('scaler', scl), ('clf', clf)])
clf_x = pipeline.fit_transform(df[['OverallHeight-ToptoBottom', 'OverallDepth-FronttoBack']], df['OverallWidth-SidetoSide'])
return clf_x
df = pd.read_excel('t.xlsx',skiprows=[1])
print(df)
buildit(df)
Solution 2:
Change your missing value identifier from 'np.nan' to something else (maybe 0 or a very big number). I had the same issue and this worked for me.
Post a Comment for "Pipeline Imputer Error "input Contains Nan""