Unicode

TLDR: Write and Read Unicode from files with Python 2 and 3

Published on

TLDR: Write and Read Unicode from files with Python 2 and 3

Chatting with a friend, Mário Sérgio, about problems that happen when you migrate your codebase from Python versions, I came out with the idea of writing this TLDR.

I hope it can help someone that is trying to work with texts that contain Unicode characters that don’t fall back into the ASCII table like other than the roman alphabet and emoji.

On Python 2 there’s no distinction between byte and string. It leads us to eventually not correctly encode/decode the data when we deal with input and output. That kind of mistake can cause runtime errors like that: