I’ve just been hit by a very interesting problem in a project, on which I work. I needed to extend some code my colleague wrote. I did it, but when I added tests, I discovered that when I run the single test I added, it’s all fine. However, when I run it in a group with tests that pre-existed, it kept failing. Moreover, when I started running those tests in random order, it turned out that different amount passed on each run. Clearly there was some unwanted dependency hidden. After some time I found the cause: wrong management of mutable objects (or copying dictionaries to put it simple). Oh and my laziness that was expressed by copying existing code rather than using test fixtures. Simple as it may seem, the issue made me read a thing or 2 about the way Python manages objects, so I’d like to share it.
In [1]: d1 = {'a': 1} In [2]: d2 = d1 In [3]: id(d1) Out[3]: 4574363584 In [4]: id(d2) Out[4]: 4574363584 In [5]: d2.pop('a') Out[5]: 1 In [6]: d1 Out[6]: {} In [7]: d2 Out[7]: {}
The call to id()
returns us the identifier of an object, which effectively is somehow connected to its address according to the help message:
Help on built-in function id in module builtins: id(obj, /) Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (CPython uses the object's memory address.)
So what we see above means that d2 = d1
assigns reference to a dictionary. Modifying d1
affects d2
. What can we do against it? Well, we can call copy
method.
In [9]: d1 = {'a': 1} In [10]: d2 = d1.copy() In [11]: id(d1) Out[11]: 4574701760 In [12]: id(d2) Out[12]: 4574684032 In [13]: d1.pop('a') Out[13]: 1 In [14]: d1 Out[14]: {} In [15]: d2 Out[15]: {'a': 1}
The ids of d1
and s2
differ, so we can modify the former without touching the latter. Cool, right? But let’s consider a nested dictionary.
In [16]: d1 = {'a': {'b': 1}} In [17]: d2 = d1.copy() In [18]: id(d1) Out[18]: 4574542272 In [19]: id(d2) Out[19]: 4574729024 In [20]: id(d1['a']) Out[20]: 4575070016 In [21]: id(d2['a']) Out[21]: 4575070016
The copy()
method did copy the root of the dictionary, but the nested objects are copied by reference, not by value. Can we solve this? Yes, by using deepcopy
function.
In [22]: from copy import deepcopy In [23]: d1 = {'a': {'b': 1}} In [24]: d2 = deepcopy(d1) In [25]: id(d1['a']) Out[25]: 4575190080 In [26]: id(d2['a']) Out[26]: 4575038784 In [27]: d1['a'].pop('b') Out[27]: 1 In [28]: d2 Out[28]: {'a': {'b': 1}} In [29]: d1 Out[29]: {'a': {}}
It’s also worth noting that the deepcopy
is a bit more intelligent than simply doing a recursive copies:
Two problems often exist with deep copy operations that don't exist with shallow copy operations: a) recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop b) because deep copy copies *everything* it may copy too much, e.g. administrative data structures that should be shared even between copies Python's deep copy operation avoids these problems by: a) keeping a table of objects already copied during the current copying pass b) letting user-defined classes override the copying operation or the set of components copied