Unicodedecodeerror: 'utf8' Codec Can't Decode Byte 0xea

December 01, 2023 Post a Comment

I have a CSV file that I'm uploading via an HTML form to a Python API The API looks like this: @app.route('/add_candidates_to_db', methods=['GET','POST']) def add_candidates():

Solution 1:

Your data is not UTF-8, it contains errors. You say that you are generating the data, so the ideal solution is to generate better data.

Unfortunately, sometimes we are unable to get high-quality data, or we have servers that give us garbage and we have to sort it out. For these situations, we can use less strict error handling when decoding text.

Instead of:

file.read().decode('UTF8')

You can use:

file.read().decode('UTF8', 'replace')

This will make it so that any “garbage” characters (anything which is not correctly encoded as UTF-8) will get replaced with U+FFFD, which looks like this:

�

You say that your file has the Í character, but you are probably viewing the file using an encoding other than UTF-8. Is your file supposed to contain Í, or is it just mojibake? Maybe you can figure out what the character is supposed to be, and from that, you can figure out what encoding your data uses if it's not UTF-8.

Solution 2:

It seems that your file is not encoded in utf8. You can try reading the file with all the encodings that Python understand and check which lets you read the entire content of the file. Try this script:

from codecs importopen

encodings = [
    "ascii",
    "big5",
    "big5hkscs",
    "cp037",
    "cp424",
    "cp437",
    "cp500",
    "cp720",
    "cp737",
    "cp775",
    "cp850",
    "cp852",
    "cp855",
    "cp856",
    "cp857",
    "cp858",
    "cp860",
    "cp861",
    "cp862",
    "cp863",
    "cp864",
    "cp865",
    "cp866",
    "cp869",
    "cp874",
    "cp875",
    "cp932",
    "cp949",
    "cp950",
    "cp1006",
    "cp1026",
    "cp1140",
    "cp1250",
    "cp1251",
    "cp1252",
    "cp1253",
    "cp1254",
    "cp1255",
    "cp1256",
    "cp1257",
    "cp1258",
    "euc_jp",
    "euc_jis_2004",
    "euc_jisx0213",
    "euc_kr",
    "gb2312",
    "gbk",
    "gb18030",
    "hz",
    "iso2022_jp",
    "iso2022_jp_1",
    "iso2022_jp_2",
    "iso2022_jp_2004",
    "iso2022_jp_3",
    "iso2022_jp_ext",
    "iso2022_kr",
    "latin_1",
    "iso8859_2",
    "iso8859_3",
    "iso8859_4",
    "iso8859_5",
    "iso8859_6",
    "iso8859_7",
    "iso8859_8",
    "iso8859_9",
    "iso8859_10",
    "iso8859_13",
    "iso8859_14",
    "iso8859_15",
    "iso8859_16",
    "johab",
    "koi8_r",
    "koi8_u",
    "mac_cyrillic",
    "mac_greek",
    "mac_iceland",
    "mac_latin2",
    "mac_roman",
    "mac_turkish",
    "ptcp154",
    "shift_jis",
    "shift_jis_2004",
    "shift_jisx0213",
    "utf_32",
    "utf_32_be",
    "utf_32_le",
    "utf_16",
    "utf_16_be",
    "utf_16_le",
    "utf_7",
    "utf_8",
    "utf_8_sig",
]

for encoding in encodings:
    try:
        withopen(file, encoding=encoding) as f:
            f.read()
        print('Seemingly working encoding: {}'.format(encoding))
    except:
        pass

where file is again the filename of your file.

Learn Python Programming

Unicodedecodeerror: 'utf8' Codec Can't Decode Byte 0xea

Solution 1:

Solution 2:

Post a Comment for "Unicodedecodeerror: 'utf8' Codec Can't Decode Byte 0xea"