1

I'm serializing some data into a pickle file. Unfortunately the structure of the data might change. Therefore I have a static VERSION number in the code that is incremented if the data structure has changed. In such case the data from the pickle file is invalid and should be discarded.

Therefore I tried to save a tuple consisting of the data and a version number. But restoring it from pickle raises a UnicodeDecodeError:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

I wonder how you would include a version number? Embedding it in the file path is an option, but much more complicated. Here's my code:

#%% Create a dataframe

import pandas as pd
values = {'Latitude': {0: 47.021503365600005,
  1: 47.021503365600005,
  2: 47.021503365600005,
  3: 47.021503365600005,
  4: 47.021503365600005,
  5: 47.021503365600005},
 'Longitude': {0: 15.481974060399999,
  1: 15.481974060399999,
  2: 15.481974060399999,
  3: 15.481974060399999,
  4: 15.481974060399999,
  5: 15.481974060399999}}

df = pd.DataFrame(values)
df.head()

#%% Save the dataframe including a version number

import pickle
VERSION = 1

file_path = 'tmp.p'
with open(file_path, 'wb') as f:
    pickle.dump((df, VERSION), f)

#%% Load the dataframe including the original verison number

try:
    with open(file_path, 'r') as f:
        df, version = pickle.load(f)
except ValueError as ex:
    print (ex)
    version = -1

#%% Compare version numbers

if version != VERSION:
    print ('Version do not match')

2 Answers 2

2

There might be a problem with the mode you used to open the file for the read operation. For writing you use wb (write in binary mode) but for reading you use r (read not in binary mode, the b was omitted).

open(file_path, 'rb') as f

This can be an issue if you are on Windows.

See here for more details: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

Sign up to request clarification or add additional context in comments.

1 Comment

Right, just found the problem. I was saving in binary mode but reading in non-binary mode. See this post too.
0

If you really want to store you object using pickle, you can store a tuple in a csv file like this:

with open('my_file.csv', 'w') as fd:
    writer = csv.writer(fd)
    writer.writerow([version_number, pickle.dumps(fd)])

You will only have one file (not two, as you put in the comment), i.e. the csv file. pickle.dumps returns a string, while pickle.loads loads the object from a string, compare https://docs.python.org/3/library/pickle.html#pickle.dumps and https://docs.python.org/3/library/pickle.html#pickle.loads

Then you read the data like this

with open('my_file.csv') as fd:
    reader = csv.reader(fd)
    row = csv.readrow()
    fd_class = get_fd_class_by_version(row[0])
    fd = pickle.loads(row[1])

Here get_fd_class_by_version is a kind of factory that returns the class in dependency of the version you stored.

3 Comments

Then I end up having two files. The data and the meta-data. I really wonder why I'm getting the exception on deserialization since a tuple can be pickled.
put the full trace back to get help with your error message, please
There is no trace. Anyway, you can just copy my code and execute it for reproducing the error.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.