On Python, Mutability, Copy and Deepcopy

I’ve just been hit by a very interesting problem in a project, on which I work. I needed to extend some code my colleague wrote. I did it, but when I added tests, I discovered that when I run the single test I added, it’s all fine. However, when I run it in a group with tests that pre-existed, it kept failing. Moreover, when I started running those tests in random order, it turned out that different amount passed on each run. Clearly there was some unwanted dependency hidden. After some time I found the cause: wrong management of mutable objects (or copying dictionaries to put it simple). Oh and my laziness that was expressed by copying existing code rather than using test fixtures. Simple as it may seem, the issue made me read a thing or 2 about the way Python manages objects, so I’d like to share it.

In [1]: d1 = {'a': 1}

In [2]: d2 = d1

In [3]: id(d1)
Out[3]: 4574363584

In [4]: id(d2)
Out[4]: 4574363584

In [5]: d2.pop('a')
Out[5]: 1

In [6]: d1
Out[6]: {}

In [7]: d2
Out[7]: {}

The call to id() returns us the identifier of an object, which effectively is somehow connected to its address according to the help message:

Help on built-in function id in module builtins:

id(obj, /)
    Return the identity of an object.

    This is guaranteed to be unique among simultaneously existing objects.
    (CPython uses the object's memory address.)

So what we see above means that d2 = d1 assigns reference to a dictionary. Modifying d1 affects d2. What can we do against it? Well, we can call copy method.

In [9]: d1 = {'a': 1}

In [10]: d2 = d1.copy()

In [11]: id(d1)
Out[11]: 4574701760

In [12]: id(d2)
Out[12]: 4574684032

In [13]: d1.pop('a')
Out[13]: 1

In [14]: d1
Out[14]: {}

In [15]: d2
Out[15]: {'a': 1}

The ids of d1 and s2 differ, so we can modify the former without touching the latter. Cool, right? But let’s consider a nested dictionary.

In [16]: d1 = {'a': {'b': 1}}

In [17]: d2 = d1.copy()

In [18]: id(d1)
Out[18]: 4574542272

In [19]: id(d2)
Out[19]: 4574729024

In [20]: id(d1['a'])
Out[20]: 4575070016

In [21]: id(d2['a'])
Out[21]: 4575070016

The copy() method did copy the root of the dictionary, but the nested objects are copied by reference, not by value. Can we solve this? Yes, by using deepcopy function.

In [22]: from copy import deepcopy

In [23]: d1 = {'a': {'b': 1}}

In [24]: d2 = deepcopy(d1)

In [25]: id(d1['a'])
Out[25]: 4575190080

In [26]: id(d2['a'])
Out[26]: 4575038784

In [27]: d1['a'].pop('b')
Out[27]: 1

In [28]: d2
Out[28]: {'a': {'b': 1}}

In [29]: d1
Out[29]: {'a': {}}

It’s also worth noting that the deepcopy is a bit more intelligent than simply doing a recursive copies:

Two problems often exist with deep copy operations that don't exist with shallow copy operations: a) recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop b) because deep copy copies *everything* it may copy too much, e.g. administrative data structures that should be shared even between copies Python's deep copy operation avoids these problems by: a) keeping a table of objects already copied during the current copying pass b) letting user-defined classes override the copying operation or the set of components copied

Read more on this in the docs.