One Hot Encoding Giving Same Number For Different Words In Keras
Why I am getting same results for different words? import keras keras.__version__ '1.0.0' import theano theano.__version__ '0.8.1' from keras.preprocessing.text import one_hot on
Solution 1:
unicity non-guaranteed in one hot encoding
Solution 2:
From the Keras source code, you can see that the words are hashed modulo the output dimension (43, in your case):
def one_hot(text, n,
filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
lower=True,
split=' '):
seq = text_to_word_sequence(text,
filters=filters,
lower=lower,
split=split)
return [(abs(hash(w)) % (n - 1) + 1) for w inseq]
So it is very likely that there will be a collision.
Post a Comment for "One Hot Encoding Giving Same Number For Different Words In Keras"