Explanation how persistency can be achieved inside ERP5.
Table of Contents
Persistency¶
A persistent object is an object which, once instanciated as a property of
another persistent object, will be stored persistently in the Zope object
database with its own OID - a unique persistent object identifier.
Simple example¶
from persistent import Persistent
class Foo(Persistent):
def __init__(self):
self.my_property = 1
This class, if instanciated and put as a property on an existing persistent
object (on an "empty" ZODB it will be the special root object, in an empty Zope
site it will probably be the Application object), will be stored in the ZODB
under a newly assigned OID.
But there is a non-persistent object which also has been stored : the int value
stored in the my_property instance. Another one is the very name of that property,
the "my_property" string.
If the int value is changed, what will be stored in the ZODB is a new complete
serialization of the Foo instance, with same OID. The change is detected because
we call __setattr__ defined on Persistent, which explains why modifying a mutable
property value will not cause container object to be saved in the ZODB - change
is not detected.
Complex example¶
based on workflow history handling
Consider the following object-gathering line.
foo_document.workflow_history['edit_workflow'][-1]['date']
Dissection of classes traversed by that line:
foo_document -> Document
workflow_history -> PersistentMapping
['edit_workflow'] -> tuple
[-1] -> dict
['date'] -> DateTime
Document is any documentation type, which inheritates from Persistent.
PersistentMapping is a standard Zope type inheritating from Persistent.
tuple and dict are standard python types which do not inherit form Persistent.
DateTime is a standard Zope type which does not inherit from Persistent.
This means that:
-
Modifying foo_document properties (involving calling foo_document.__setattr__
at some point) will cause foo_document to be saved, which does not include
serializing workflow_history, because workflow_history is also persistent
and saved separately from foo_document.
-
Modifying workflow_history (same remark as above) will cause workflow_history
to be saved, which does not include modifying foo_document in any way, so
it's not serialized again. But this does include serializing all the non-persistent
subobjects.
-
Modifying tuple is impossible because it's an immutable type, but replacing
it by another tuple is possible, so we fall in "modifying workflow_history"
case. As the ZODB keeps history of previous object versions - until packed
- it means that increasing one by one the tuple length will cause ZODB size
to increase exponentially: 1 + 2 + 3 + 4...
-
Modifying dict will cause nothing to happen at storage level, because it is
mutable and not persistent.
-
Modifying DateTime is just the same as modifying the dict, nothing will
happen at storage level.
When to inherit from Persistent¶
When implemeting a class which instances will often be modified (the tuple in
above example) you should make it persistent, to avoid impacting the container
at each change.
Note that some container handle this impact better than others. An example is
BTreeFolder2, because it avoids being modified completely when a subobject is
added/deleted or modified (if it implies calling __setattr__ on the BTreeFolder2).
When not to inherit from Persistent¶
When implementing a class which instances will be and stay small (only reading
the pickle from ZODB can tell you if the object is small) compared to the size
of ZODB object header (which is basically the class name). Otherwise it will
hurt information density, and the ZODB will contain more object header data
than actual object payload.
Tools¶
You should first take a look to ZOPE's standard tools (in bin directory of your
zope installation) related to persistency. What you can get from which tool:
Note : accuracy has not been checked. Feel free to comment the entries on that subject.
- analyze.py
transaction overhead (Data.fs size minus record size) - record overhead
(record size minus object size) - old object size impact (pack gain
estimation) - list of used classes with size repartition.
- fsdump.py (simply uses ZODB product tool with same name)
list of transactions - list of object in each transaction, with class, oid and
size.
- fstail.py
user, description, and size of a few last transactions
- netspace.py
display the size of objects including their subobjects
- netspace.py
- space.py
number of instances of each used class with total size
You might also want to check
treenalyzer.py we developped (based on netspace.py described above):
to display the size of objects including their subobjects - displays hexadecimal
dump of persistent objects - displays hexadecimal dump of individual
non-persistent properties with statistics similar to space.py described above.
z3c.zodbbrowser is a Zope 3
product which is actually a stand-alone GTK ZODB
browser which will show you more objects that ZMI, but with few low-level
details (like no file offset or sizes). Definitely worth trying for a quick
object lookup, but maybe not enough.
Related Articles¶