15. Pickling C Extension Types¶
If you need to provide support for pickling your specialised types from your C extension then you need to implement some special functions.
This example shows you how to provided pickle support for for the custom2.Custom
type described in the C extension tutorial in the
Python documentation.
15.1. Pickle Version Control¶
Since the whole point of pickle
is persistence then pickled objects can hang around in databases, file systems, data from the shelve module and whatnot for a long time.
It is entirely possible that when un-pickled, sometime in the future, that your C extension has moved on and then things become awkward.
It is strongly recommended that you add some form of version control to your pickled objects. In this example I just have a single integer version number which I write to the pickled object. If the number does not match on unpickling then I raise an exception. When I change the type API I would, judiciously, change this version number.
Clearly more sophisticated strategies are possible by supporting older versions of the pickled object in some way but this will do for now.
We add some simple pickle version information to the C extension:
static const char* PICKLE_VERSION_KEY = "_pickle_version";
static int PICKLE_VERSION = 1;
Now we can implement __getstate__
and __setstate__
, think of these as symmetric operations. First __getstate__
.
15.2. Implementing __getstate__
¶
__getstate__
pickles the object.
__getstate__
is expected to return a dictionary of the internal state of the Custom
object.
Note that a Custom
object has two Python objects (first
and last
) and a C integer (number
) that need to be converted to a Python object.
We also need to add the version information.
Here is the C implementation:
1 2 3 4 5 6 7 8 9 10 | /* Pickle the object */
static PyObject *
Custom___getstate__(CustomObject *self, PyObject *Py_UNUSED(ignored)) {
PyObject *ret = Py_BuildValue("{sOsOsisi}",
"first", self->first,
"last", self->last,
"number", self->number,
PICKLE_VERSION_KEY, PICKLE_VERSION);
return ret;
}
|
15.3. Implementing __setstate__
¶
The implementation of __setstate__
un-pickles the object.
This is a little more complicated as there is quite a lot of error checking going on.
We are being passed an arbitrary Python object and need to check:
It is a Python dictionary.
It has a version key and the version value is one that we can deal with.
It has the required keys and values to populate our
Custom
object.
Note that our __new__
method (Custom_new()
) has already been called on self
.
Before setting any member value we need to de-allocate the existing value set by Custom_new()
otherwise we will have a memory leak.
15.3.1. Error Checking¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | /* Un-pickle the object */
static PyObject *
Custom___setstate__(CustomObject *self, PyObject *state) {
/* Error check. */
if (!PyDict_CheckExact(state)) {
PyErr_SetString(PyExc_ValueError, "Pickled object is not a dict.");
return NULL;
}
/* Version check. */
/* Borrowed reference but no need to increment as we create a C long
* from it. */
PyObject *temp = PyDict_GetItemString(state, PICKLE_VERSION_KEY);
if (temp == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_Format(PyExc_KeyError, "No \"%s\" in pickled dict.",
PICKLE_VERSION_KEY);
return NULL;
}
int pickle_version = (int) PyLong_AsLong(temp);
if (pickle_version != PICKLE_VERSION) {
PyErr_Format(PyExc_ValueError,
"Pickle version mismatch. Got version %d but expected version %d.",
pickle_version, PICKLE_VERSION);
return NULL;
}
|
15.3.2. Set the first
Member¶
1 2 3 4 5 6 7 8 9 10 11 | /* NOTE: Custom_new() will have been invoked so self->first and self->last
* will have been allocated so we have to de-allocate them. */
Py_DECREF(self->first);
self->first = PyDict_GetItemString(state, "first"); /* Borrowed reference. */
if (self->first == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"first\" in pickled dict.");
return NULL;
}
/* Increment the borrowed reference for our instance of it. */
Py_INCREF(self->first);
|
15.3.3. Set the last
Member¶
/* Similar to self->first above. */
Py_DECREF(self->last);
self->last = PyDict_GetItemString(state, "last"); /* Borrowed reference. */
if (self->last == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"last\" in pickled dict.");
return NULL;
}
Py_INCREF(self->last);
15.3.4. Set the number
Member¶
This is a C fundamental type so the code is slightly different:
/* Borrowed reference but no need to incref as we create a C long from it. */
PyObject *number = PyDict_GetItemString(state, "number");
if (number == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"number\" in pickled dict.");
return NULL;
}
self->number = (int) PyLong_AsLong(number);
And we are done.
Py_RETURN_NONE;
}
15.3.5. __setstate__
in Full¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | /* Un-pickle the object */
static PyObject *
Custom___setstate__(CustomObject *self, PyObject *state) {
/* Error check. */
if (!PyDict_CheckExact(state)) {
PyErr_SetString(PyExc_ValueError, "Pickled object is not a dict.");
return NULL;
}
/* Version check. */
/* Borrowed reference but no need to increment as we create a C long
* from it. */
PyObject *temp = PyDict_GetItemString(state, PICKLE_VERSION_KEY);
if (temp == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_Format(PyExc_KeyError, "No \"%s\" in pickled dict.",
PICKLE_VERSION_KEY);
return NULL;
}
int pickle_version = (int) PyLong_AsLong(temp);
if (pickle_version != PICKLE_VERSION) {
PyErr_Format(PyExc_ValueError,
"Pickle version mismatch. Got version %d but expected version %d.",
pickle_version, PICKLE_VERSION);
return NULL;
}
/* NOTE: Custom_new() will have been invoked so self->first and self->last
* will have been allocated so we have to de-allocate them. */
Py_DECREF(self->first);
self->first = PyDict_GetItemString(state, "first"); /* Borrowed reference. */
if (self->first == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"first\" in pickled dict.");
return NULL;
}
/* Increment the borrowed reference for our instance of it. */
Py_INCREF(self->first);
/* Similar to self->first above. */
Py_DECREF(self->last);
self->last = PyDict_GetItemString(state, "last"); /* Borrowed reference. */
if (self->last == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"last\" in pickled dict.");
return NULL;
}
Py_INCREF(self->last);
/* Borrowed reference but no need to incref as we create a C long from it. */
PyObject *number = PyDict_GetItemString(state, "number");
if (number == NULL) {
/* PyDict_GetItemString does not set any error state so we have to. */
PyErr_SetString(PyExc_KeyError, "No \"number\" in pickled dict.");
return NULL;
}
self->number = (int) PyLong_AsLong(number);
Py_RETURN_NONE;
}
|
15.4. Add the Special Methods¶
Now we need to add these two special methods to the methods table which now looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 | static PyMethodDef Custom_methods[] = {
{"name", (PyCFunction) Custom_name, METH_NOARGS,
"Return the name, combining the first and last name"
},
{"__getstate__", (PyCFunction) Custom___getstate__, METH_NOARGS,
"Pickle the Custom object"
},
{"__setstate__", (PyCFunction) Custom___setstate__, METH_O,
"Un-pickle the Custom object"
},
{NULL} /* Sentinel */
};
|
15.5. Pickling a custom2.Custom
Object¶
We can test this with code like this that pickles one custom2.Custom
object then creates another custom2.Custom
object from that pickle.
Here is some Python code that exercises our module:
1 2 3 4 5 6 7 8 9 10 11 12 | import pickle
import custom2
original = custom2.Custom('FIRST', 'LAST', 11)
print(f'original is {original} @ 0x{id(original):x}')
print(f'original first: {original.first} last: {original.last} number: {original.number} name: {original.name()}')
pickled_value = pickle.dumps(original)
print(f'Pickled original is {pickled_value}')
result = pickle.loads(pickled_value)
print(f'result is {result} @ 0x{id(result):x}')
print(f'result first: {result.first} last: {result.last} number: {result.number} name: {result.name()}')
|
$ python main.py
original is <custom2.Custom object at 0x1049e6810> @ 0x1049e6810
original first: FIRST last: LAST number: 11 name: FIRST LAST
Pickled original is b'\x80\x04\x95[\x00\x00\x00\x00\x00\x00\x00\x8c\x07custom2\x94\x8c\x06Custom\x94\x93\x94)\x81\x94}\x94(\x8c\x05first\x94\x8c\x05FIRST\x94\x8c\x04last\x94\x8c\x04LAST\x94\x8c\x06number\x94K\x0b\x8c\x0f_pickle_version\x94K\x01ub.'
result is <custom2.Custom object at 0x1049252d0> @ 0x1049252d0
result first: FIRST last: LAST number: 11 name: FIRST LAST
So we have pickled one object and recreated a different, but equivalent, instance from the pickle of the original object which is what we set out to do.
15.6. The Pickled Object in Detail¶
If you are curious about the contents of the pickled object the the Python standard library provides the pickletools module. This allows you to inspect the pickled object. So if we run this code:
1 2 3 4 5 6 7 8 9 10 | import pickle
import pickletools
import custom2
original = custom2.Custom('FIRST', 'LAST', 11)
pickled_value = pickle.dumps(original)
print(f'Pickled original is {pickled_value}')
# NOTE: Here we are adding annotations.
pickletools.dis(pickled_value, annotate=1)
|
The output will be something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | Pickled original is b'\x80\x04\x95[\x00\x00\x00\x00\x00\x00\x00\x8c\x07custom2\x94\x8c\x06Custom\x94\x93\x94)\x81\x94}\x94(\x8c\x05first\x94\x8c\x05FIRST\x94\x8c\x04last\x94\x8c\x04LAST\x94\x8c\x06number\x94K\x0b\x8c\x0f_pickle_version\x94K\x01ub.'
0: \x80 PROTO 4 Protocol version indicator.
2: \x95 FRAME 91 Indicate the beginning of a new frame.
11: \x8c SHORT_BINUNICODE 'custom2' Push a Python Unicode string object.
20: \x94 MEMOIZE (as 0) Store the stack top into the memo. The stack is not popped.
21: \x8c SHORT_BINUNICODE 'Custom' Push a Python Unicode string object.
29: \x94 MEMOIZE (as 1) Store the stack top into the memo. The stack is not popped.
30: \x93 STACK_GLOBAL Push a global object (module.attr) on the stack.
31: \x94 MEMOIZE (as 2) Store the stack top into the memo. The stack is not popped.
32: ) EMPTY_TUPLE Push an empty tuple.
33: \x81 NEWOBJ Build an object instance.
34: \x94 MEMOIZE (as 3) Store the stack top into the memo. The stack is not popped.
35: } EMPTY_DICT Push an empty dict.
36: \x94 MEMOIZE (as 4) Store the stack top into the memo. The stack is not popped.
37: ( MARK Push markobject onto the stack.
38: \x8c SHORT_BINUNICODE 'first' Push a Python Unicode string object.
45: \x94 MEMOIZE (as 5) Store the stack top into the memo. The stack is not popped.
46: \x8c SHORT_BINUNICODE 'FIRST' Push a Python Unicode string object.
53: \x94 MEMOIZE (as 6) Store the stack top into the memo. The stack is not popped.
54: \x8c SHORT_BINUNICODE 'last' Push a Python Unicode string object.
60: \x94 MEMOIZE (as 7) Store the stack top into the memo. The stack is not popped.
61: \x8c SHORT_BINUNICODE 'LAST' Push a Python Unicode string object.
67: \x94 MEMOIZE (as 8) Store the stack top into the memo. The stack is not popped.
68: \x8c SHORT_BINUNICODE 'number' Push a Python Unicode string object.
76: \x94 MEMOIZE (as 9) Store the stack top into the memo. The stack is not popped.
77: K BININT1 11 Push a one-byte unsigned integer.
79: \x8c SHORT_BINUNICODE '_pickle_version' Push a Python Unicode string object.
96: \x94 MEMOIZE (as 10) Store the stack top into the memo. The stack is not popped.
97: K BININT1 1 Push a one-byte unsigned integer.
99: u SETITEMS (MARK at 37) Add an arbitrary number of key+value pairs to an existing dict.
100: b BUILD Finish building an object, via __setstate__ or dict update.
101: . STOP Stop the unpickling machine.
highest protocol among opcodes = 4
|
15.7. Pickling Objects with External State¶
This is just a simple example, if your object relies on external state such as open files, databases and the like you need to be careful, and knowledgeable about your state management. There is some useful information here: Handling Stateful Objects
15.8. References¶
Python API documentation for __setstate__
Python API documentation for __getstate__
Useful documentation for Handling Stateful Objects
Python pickle module
Python shelve module