Skip to content Skip to sidebar Skip to footer

One Hot Encoding Giving Same Number For Different Words In Keras

Why I am getting same results for different words? import keras keras.__version__ '1.0.0' import theano theano.__version__ '0.8.1' from keras.preprocessing.text import one_hot on

Solution 1:

unicity non-guaranteed in one hot encoding

see one hot keras documentation

Solution 2:

From the Keras source code, you can see that the words are hashed modulo the output dimension (43, in your case):

def one_hot(text, n,
        filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
        lower=True,
        split=' '):
    seq = text_to_word_sequence(text,
                            filters=filters,
                            lower=lower,
                            split=split)
    return [(abs(hash(w)) % (n - 1) + 1) for w inseq]

So it is very likely that there will be a collision.

Post a Comment for "One Hot Encoding Giving Same Number For Different Words In Keras"