Keep Csv Feature Labels For Lda Pca
Solution 1:
PCA doesn't discard or retain features, but the component results don't map to features either. (Given x
, y
, z
and an n_components=2
param, the resulting two components won't map to any of xyz
perfectly.) If you want to retain the feature names as part of dimensionality reduction, you might want to explore other approaches (sklearn has a whole section for this).
Chuck Ivan is correct that an encoder or vectorizer is called for before you can do PCA. I like his OrdinalEncoder suggestion, but you may also consider the sklearn text utilities on this list: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_extraction.text
Solution 2:
PCA works by solving an optimization problem that requires your features to be numeric. This code is trying to perform PCA on non-numeric data. You will need to factorize (encode) the strings into numbers. sklearn.preprocessing.OrdinalEncoder and sklearn.preprocessing.OneHotEncoder handle that.
Charles Landau's feature extraction solution looks very relevant to the question.
Post a Comment for "Keep Csv Feature Labels For Lda Pca"