Reading Unicode CSV Files in Python

The python csv module doesn’t support unicode. That’s a who-cares most days of most years, but if you suddenly have a need to import some csv data that contains letters with little squiggles over them, you’re pretty bummed out.

I’ve had to write csv reading and writing code from scratch before, in Java. CSV may be the most ridiculously terrible file format I’ve ever had the displeasure of being forced to care about, but it’s not very difficult. So I spiked a parser in python that handles unicode.

It’s not complete. It’s probably not even close. I know it doesn’t handle multi-line cells, for one thing. It certainly doesn’t have the bells and whistles that the official module has… but it worked for me. It might work for you too.

If it doesn’t, let me know. I’d be happy to bang on it a bit more to cure what ails it.

Download it here:

Pretty straightforward to play with. Stand up the reader, iterate over the rows. Rows are returned as lists of strings.

f = open('~/myfile.csv', 'rU')
line_reader = DudeUrGettinACSV(f)
for row in line_reader:
    print row

Post a Comment

Your email is never published nor shared. Required fields are marked *