.. _really-complicated-documents: Really Complicated Documents **************************** The library supports arbitrarily structured objects through the type expression feature. Expressions can include sequences, collections and nested messages, covering most application persistence requirements. Type expressions can also include the :ref:`PointerTo` type. This is the mechanism by which applications can store and recover `graph` objects - linked lists and trees. Even something as convoluted as an AST (i.e. abstract syntax tree) for a regular expression, can be stored and recovered. The Python memory model is a handle-based model - there is no explicit support for pointers as there is in languages such as C++ and Go. However, handles can be managed in a way that simulates pointer behaviour and this is what the library :ref:`PointerTo` type is for. Value Versus Pointer ==================== To understand how the library can store and recover complex graph objects, the next few paragraphs demonstrate the use of :ref:`PointerTo` and how that manifests in the stored representations. Consider the following Python script:: >> import ansar.encode as ar >> f = ar.File('flag', bool) >> b = False >> f.store(b) This produces perhaps the simplest possible application object: .. literalinclude:: include/bool.json :language: json By making a single change to the type expression:: >> import ansar.encode as ar >> f = ar.File('pointer-to-flag', ar.PointerTo(bool)) >> b = False >> f.store(b) The change in representation illustrates the effect of the change in type expression: .. literalinclude:: include/pointer-to-bool.json :language: json The ``value`` has changed from ``false`` to a rather long string starting with ``1100`` and that same string value appears in the ``pointer`` list, paired up with the ``false`` value. There is no obvious difference when recovering the representation:: >>> r, _ = f.recover() >>> r False .. note:: The strings used as identities for pointers are rather long. This is a necessity brought about by the inclusion of polymorphic capability (:ref:`going-incognito`). Suffice to say that implementation was quite detailed and can result in pointers from multiple inbound encodings being merged into a single outbound encoding. The long string guarantees uniqueness of entries. Many Pointers To The Same Value ------------------------------- The next step is to demonstrate how the management of :ref:`PointerTo` objects can be used to share a common object. An example of sharing a value in Python, looks like this:: >> b = True >> a = [b, b, b, b] Creating a stored representation looks like this:: >> f = ar.File('array-of-pointer', ar.ArrayOf(ar.PointerTo(bool), 4)) >> f.store(a) The representation records the single instance of a ``bool`` and the multiple references to that instance: .. literalinclude:: include/array-of-pointer.json :language: json After recovering this representation, a few careful Python expressions expose the true reason for the existence of :ref:`PointerTo`:: >>> r, _ = f.recover() >>> r [True, True, True, True] >>> id(r[0]) == id(r[2]) True >>> id(r[0]) == id(r[3]) True Comparing the ``id()s`` of the array elements shows that the recovered array ``r`` is populated with the single handle, referring to a common ``bool`` value. Exploiting this ability to manage Python handles leads to much more significant constructions, such as linked lists. The following sections show how such lists can be built, stored and recovered. A Linked List ============= Consider the following class declaration: .. literalinclude:: include/linked-list.py :language: python A ``Node`` contains a single member ``next`` that refers to the following instance of a ``Node``, presumably until there is a ``None`` value. This Python session demonstrates the construction and persistence of a linked-list of ``Nodes``:: >>> next = None >>> for i in range(4): ... c = Node(next) ... next = c ... >>> c <__main__.Node object at 0x7fa0395507f0> >>> c.next <__main__.Node object at 0x7fa039550760> >>> c.next.next <__main__.Node object at 0x7fa0395506a0> >>> c.next.next.next <__main__.Node object at 0x7fa039550670> >>> c.next.next.next.next >>> f = ar.File('linked-list', ar.PointerTo(Node)) >>> f.store(c) >>> r, _ = f.recover() >>> r <__main__.Node object at 0x7fa039550df0> >>> r.next <__main__.Node object at 0x7fa0395508e0> >>> r.next.next <__main__.Node object at 0x7fa039550f70> >>> r.next.next.next <__main__.Node object at 0x7fa039550e80> >>> r.next.next.next.next >>> The ``for`` loop creates a linked-list in a reverse direction - the first ``Node`` object created is eventually the last of 4 ``Nodes`` in the list. The list passes through :py:meth:`~ansar.encode.file.File.store` and :py:meth:`~ansar.encode.file.File.recover` operations. The recovered list is examined in the same manner as the original, verifying that all links are in place and that the list is properly terminated (i.e. the final ``next`` member is set to the value ``None``). The stored representation looks like this: .. literalinclude:: include/linked-list.json :language: json .. note:: The encoding machinery omits any member with a value of ``None`` and due to the fact that a ``Node`` has only one member, this results in a representation that includes an empty JSON object - ``{}``. A Loop Of Links =============== A single additional line of Python links the final ``Node`` back to the start of the list, creating an endless loop. The library detects "cycles" or "circular references" and uses back-patching to complete complex constructions of this nature. This ability is necessary for the proper handling of graph objects:: >>> next = None >>> for i in range(4): ... c = Node(next) ... next = c ... >>> c <__main__.Node object at 0x7fa0395506a0> >>> c.next.next.next.next >>> c.next.next.next.next = c >>> c.next.next.next.next <__main__.Node object at 0x7fa0395506a0> >>> c.next.next.next.next.next.next.next.next <__main__.Node object at 0x7fa0395506a0> >>> f = ar.File('linked-loop', ar.PointerTo(Node)) >>> f.store(c) >>> r, _ = f.recover() >>> r <__main__.Node object at 0x7fa039555370> >>> r.next.next.next.next <__main__.Node object at 0x7fa039555370> >>> The assignment of the start ``Node`` to the fourth link, closes the loop. The loop is then passed through a store and recover process. The start object and final link of the recovered loop, are shown to be the same address. The stored representation looks like: .. literalinclude:: include/linked-loop.json :language: json A Map Of Loops ============== Linking can be combined with the structured types. Here is the construction of a map of loops:: >>> f = ar.File('map-loop', ar.MapOf(str,ar.PointerTo(Node))) >>> m = {} >>> def loop(): ... next = None ... for x in range(4): ... n = Node(next) ... next = n ... next.next.next.next.next = next ... return next ... >>> m['bjorn']=loop() >>> m['sven']=loop() >>> m['hilda']=loop() >>> m['freya']=loop() >>> bjorn = m['bjorn'] >>> bjorn <__main__.Node object at 0x7fa039555430> >>> bjorn.next.next.next.next <__main__.Node object at 0x7fa039555430> >>> bjorn.next.next.next.next.next.next.next.next <__main__.Node object at 0x7fa039555430> >>> f.store(m) >>> r, _ = f.recover() >>> bjorn = r['bjorn'] >>> bjorn <__main__.Node object at 0x7fa039555f40> >>> bjorn.next.next.next.next <__main__.Node object at 0x7fa039555f40> >>> A ``dict`` is populated with named loops. This creates a total of 16 ``Node`` objects divided across 4 loops. This can all be traced through the following representation: .. literalinclude:: include/map-loop.json :language: json .. warning:: The entries in the :ref:`MapOf` must be :ref:`PointerTo(Node)` `not` plain ``Node``. A Document With Everything ========================== This section combines all the constructs presented in the preceding sections. The result is a single application object that includes significant structuring and graph object elements. The techniques demonstrated should be enough to serve as the basis for almost any imaginable data persistence requirement. The class declaration looks like this: .. literalinclude:: class-and-store/doc_every.py :language: python Construction and verification of a recovered ``Doc`` looks like:: >>> f = ar.File('doc', Doc) >>> d = Doc() >>> def links(): ... next = None ... for x in range(4): ... n = Node(next) ... next = n ... return next ... >>> def loop(): ... next = links() ... next.next.next.next.next = next ... return next ... >>> flag = True >>> d.flag = flag >>> d.pointer_to_flag = flag >>> d.array_to_flag = [flag, flag, flag, flag] >>> d.linked_list = links() >>> d.loop = loop() >>> m = {} >>> m['bjorn']=loop() >>> m['sven']= loop() >>> m['hilda']=loop() >>> m['freya']=loop() >>> d.map_of_loops = m >>> >>> f.store(d) >>> r, _ = f.recover() >>> ... bjorn = r.map_of_loops['bjorn'] >>> bjorn <__main__.Node object at 0x7f73f3521490> >>> bjorn.next.next.next.next <__main__.Node object at 0x7f73f3521490> >>> bjorn.next.next.next.next.next.next.next.next <__main__.Node object at 0x7f73f3521490> The complete representation appears below: .. literalinclude:: class-and-store/doc_every.json :language: json