Reading a text file with unicode characters – Python3

Question

I am trying to read a text file which has unicode characters (u) and other tags (n, u) in the text, here is an example: (u&#8217;B9781437714227000962&#8242;, u&#8217;Definitionu2014Human papillomavirus (HPV)u2013related proliferation of the vaginal mucosa that leads to extensive, full-thickness loss of matura…

Accepted Answer

To remove unicode escape sequence (or better: to translate them), in python3:a.encode('utf-8').decode('unicode_escape')The decode part will translate the unicode escape sequences to the relative unicode characters. Unfortunately such (un-)escape do no work on strings, so you need to encode the string first, before to decode it.But as pointed in the question comment, you have a serialized document. Try do unserialize it with the correct tools, and you will have automatically also the unicode &#8220;unescaping&#8221; part.

Advertisement

Answer