Skip to content Skip to sidebar Skip to footer

UnicodeEncodeError: 'ascii' Codec Can't Encode Character U'\xfa' In Position 42: Ordinal Not In Range(128)

def main(): client = ##client_here db = client.brazil rio_bus = client.tweets result_cursor = db.tweets.find() first = result_cursor[0] ordered_fieldnames =

Solution 1:

str(x[k]).encode('utf-8') is the problem.

str(x[k]) will convert a Unicode string to an byte string using the default ascii codec in Python 2:

>>> x = u'résumé'
>>> str(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Non-Unicode values, like booleans, will be converted to byte strings, but then Python will implicitly decode the byte string to a Unicode string before calling .encode(), because you can only encode Unicode strings. This usually won't cause an error because most non-Unicode objects have an ASCII representation. Here's an example where a custom object returns a non-ASCII str() representation:

>>> class Test(object):
...  def __str__(self):
...    return 'r\xc3\xa9sum\xc3\xa9'
...
>>> x=Test()
>>> str(x)
'r\xc3\xa9sum\xc3\xa9'
>>> str(x).encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Note the above was a decode error instead of an encode error.

If str() is only there to coerce booleans to a string, coerce it to a Unicode string instead:

unicode(x[k]).encode('utf-8')

Non-Unicode values will be converted to Unicode strings, which can then be correctly encoded, but Unicode strings will remain unchanged, so they will also be encoded correctly.

>>> x = True
>>> unicode(x)
u'True'
>>> unicode(x).encode('utf8')
'True'
>>> x = u'résumé'
>>> unicode(x).encode('utf8')
'r\xc3\xa9sum\xc3\xa9'    

P.S. Python 3 does not do implicit encode/decode between byte and Unicode strings and makes these errors easier to spot.


Post a Comment for "UnicodeEncodeError: 'ascii' Codec Can't Encode Character U'\xfa' In Position 42: Ordinal Not In Range(128)"