.. _more-about-types: More About Types **************** This section looks more closely at what the library can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics. A series of changes are made to the ``Job`` introduced in the previous section. These changes involve progressively more complex types. The full series introduces enough information to cover most application data requirements. .. _a-refresh-cycle: A Refresh Cycle =============== The starting point is the original declaration: .. literalinclude:: class-that-stores/job_basic.py :language: python Members of the class are listed below along with their respective Python types: * ``unique_id`` - uuid.UUID * ``title`` - str * ``priority`` - int * ``service`` - str * ``body`` - bytes The ``Job`` class is *complete* in that no further type information is required by the library. The library can ``store()`` and ``recover()`` far more complex application data, but to do so requires an entirely different approach to declaration of type information. Remembering When ================ Recording the moment a job was presented is likely to be useful. Adding a ``created`` member looks like this: .. literalinclude:: class-that-stores/job_created.py :language: python This new class declaration is also complete and can immediately be put to work. New application features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing: .. literalinclude:: stored-by-class/job_created.json :language: json The ``created`` member appears as a floating-point value. This might be functionally valid but the stored representation is useless as a time. It cannot be checked visually or modified with a text editor. Passing additional type information at registration time fixes this issue: .. code-block:: python ar.bind(Job, object_schema={'created': ar.ClockTime}) There is now an *explicit* description of the type of information that will be carried around in the ``created`` member. That description is being provided to the ``bind`` registration function in a *schema* and it obviates the need to assign a default value. In fact there are longer term considerations that mean it is best to use ``None`` as the default value, once a proper schema declaration is present. Readability is restored in the file contents: .. literalinclude:: stored-by-class/job_created_clocktime.json :language: json The Python ``float`` type is used for both mathematical purposes and for the marking of time. The library understands this overloading and provides a mechanism for users to be specific about their intentions. Note that it's only the *representation* that changes and the application continues to use the ``float`` type. The library automatically performs the necessary conversions during encoding and decoding. .. note:: Most applications working with times will use ``datetime`` runtime values rather than ``float``. These are registered using the ``WorldTime`` type rather than ``ClockTime``. ``ClockTime`` is retained for its convenience in certain situations. More detailed information relating to time values can be found :ref:`here`. Declaring A Full Schema ======================= A properly declared application type now looks like: .. literalinclude:: class-that-stores/job_none.py :language: python Each member of the application type is now associated with a library type, e.g. ``ar.ClockTime`` and ``ar.Integer4``. An application type with a full schema enjoys a variety of benefits: * readability of *explicit* vs *implicit* type information, * ability to disambiguate in scenarios such as ``float`` vs time, * ability to describe more complex types, * and use of ``None`` as a default value. A surprise awaits in the stored representation: .. literalinclude:: stored-by-class/job_none.json :language: json The expected ``value`` object has been reduced to the empty object - where have ``created``, ``unique_id`` and their siblings gone? The empty object is the direct consequence of an internal encoding rule - members with a value of ``None`` are omitted from stored representations. This is a simple optimization, reducing the size of resulting files. More importantly this is part of informal version management. Further information on versioning can be found :ref:`here`. Names For Runtime Numbers ========================= Where there are operational numbers that need readable representation an :ref:`Enumeration` is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this: .. literalinclude:: class-that-stores/job_transport.py :language: python A simple example of usage appears below:: >>> f = ar.File('job', Job) >>> j = Job() >>> f.store(j) >>> >>> r, _ = f.recover() >>> r.transport 3 Lastly, the stored representation looks like this: .. literalinclude:: stored-by-class/job_transport.json :language: json Sequences And Collections ========================= A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to configured parties. Each job needs a list of zero or more email addresses: .. literalinclude:: class-that-stores/job_who.py :language: python Given the updated job creation:: Job(created=ar.clock_now(), title='the quick', who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere']) The resulting file content looks like this: .. literalinclude:: stored-by-class/job_who.json :language: json A job now includes a list of email addresses. The list can be zero or more addresses but there is always a list present, i.e. the ``j.who`` member is always an instance of a Python ``list``. .. note:: A good example of an application type similar to ``Job``, but with all the bells and whistles, can be found :ref:`here`. Use of the :py:func:`~ansar.encode.message.default_vector` library function guarantees a valid default value for the member, but also touches on a :ref:`wider issue`. There is a small collection of functions such as :py:func:`~ansar.encode.convert.clock_now`. These are shorthands that can make use of this library much cleaner. They can also be considered syntactic sugar that extend the library dependency into further areas of code. The ``object_schema`` parameter is a ``dict`` of names and :ref:`type expressions` and the :ref:`VectorOf` is one example of a type expression. These are the full set of supported sequences and collections: * ``ArrayOf`` - a fixed-length sequence * ``VectorOf`` - a variable-length sequence * ``DequeOf`` - a double-ended queue * ``SetOf`` - a unique collection * ``MapOf`` - an associative array The library makes all necessary conversions between portable representations and the Python type best suited to a particular concept, i.e. ``ArrayOf`` and ``VectorOf`` map to ``list``, ``DequeOf`` to ``deque``, ``SetOf`` to ``set`` and ``MapOf`` to ``dict``. There is no explicit support for concepts such as a double-ended queue in JSON, or any other known encoding. For this reason most sequences and collections appear as lists in representations. Example usage appears below: .. code-block:: python 'recent_work': ar.DequeOf(ar.UUID), 'times_square': ar.ArrayOf(ar.ArrayOf(ar.ClockTime,2),2), 'time_periods': ar.DequeOf(ar.TimeSpan), 'priority_queue': ar.MapOf(int, ar.VectorOf(Job)), Given the previously listed type expressions and the following values: .. code-block:: python recent_work=ar.deque([ uuid.uuid4(), uuid.uuid4(), uuid.uuid4(), uuid.uuid4(), ]), times_square=[ [ar.clock_now(),ar.clock_now()], [ar.clock_now(),ar.clock_now()] ], time_periods=ar.deque([ ar.clock_span(hours=1,minutes=2,seconds=3), ar.clock_span(hours=8), ar.clock_span(minutes=10), ar.clock_span(seconds=0, milliseconds=12, microseconds=500), ar.clock_span(days=1), ]), priority_queue={ 100: [Job(title='a'), Job(title='b')], 10: [], 1: [Job(title='!')], } The resulting JSON file contents will look like this; .. literalinclude:: stored-by-class/job_sequences_collections.json :language: json Those Fixed-Size Arrays ======================= All the containers have convenient functions to help with the proper initialization of members inside registered classes: * :py:func:`~ansar.encode.message.default_vector` * :py:func:`~ansar.encode.message.default_set` * :py:func:`~ansar.encode.message.default_map` * :py:func:`~ansar.encode.message.default_deque` That is - except for arrays. The fundamental reason for that difference is that default instances of vectors, sets, maps and deques are *empty*. Additional information is required (i.e. the type of the elements and the size) to construct the *content* of an array. The fact that the elements of the array could themselves involve further nested structure, hints at the true nature of the situation. A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers:: >>> import ansar.encode as ar >>> t = ar.ArrayOf(ar.ArrayOf(int,2),4) >>> a = ar.make(t) >>> a [[None, None], [None, None], [None, None], [None, None]] The :py:func:`~ansar.encode.message.make` function returns the simplest, conforming instance of the specified type. For the simple types such as ``int`` that value is always ``None``. This behaviour is a deliberate part of the informal version management and can help applications identify what version of materials is being loaded. A revised version of the ``Job`` declaration appears below: .. literalinclude:: class-that-stores/job_who_make.py :language: python The ``who`` member is now initialized with a fixed-size array consistent with the type information registered with the ``Job`` class; .. literalinclude:: stored-by-class/job_who_make.json :language: json This demonstrates the use of the :py:func:`~ansar.encode.message.make` function. The ``who`` member is now a fixed-size array of email addresses, perhaps supporting the notion of primary and secondary points of contact. In a default instance of a ``Job``, both of these contacts are not set. For demonstration purposes, the ``make`` function was also used to initialize the ``created`` and ``unique_id`` members. As mentioned previously the default value for these types is ``None``, resulting in their omission from the representation. .. _automated-construction: Automated Construction ====================== If an object is extremely complex or there is little perceived benefit in manual, per-member initialization, the :py:func:`~ansar.encode.message.make_self` function exists to automate that process. It assumes that all members have been initialized to ``None`` and uses the supplied schema to complete the :ref:`guaranteed structure` rule. A revised version of the ``Job`` declaration appears below: .. literalinclude:: class-that-stores/job_who_make_self.py :language: python The output is the same as in the previous section, at the cost of additional CPU cycles during object construction; .. literalinclude:: stored-by-class/job_who_make_self.json :language: json Polymorphic Persistence ======================= The library supports the concept of polymorphism through the :ref:`Any` type. Rather than declaring that a file contains a specific type:: >>> f = ar.File('job', Job) >>> j = Job() >>> f.store(j) >>> r, _ = f.recover() >>> r <__main__.Job object at 0x7f75deaf0750> A file can be declared in this way:: >>> f = ar.File('job', ar.Any) >>> j = Job() >>> f.store(j) >>> r, _ = f.recover() >>> r <__main__.Job object at 0x7f75deaf0750> To the application the behaviour has not changed. If a second class declaration is introduced: .. literalinclude:: class-that-stores/maintenance_job.py :language: python A fresh sequence of operations illustrates the new possibilities:: >>> f = ar.File('job', ar.Any) >>> j = Job() >>> s = Schedule() >>> f.store(j) >>> r, _ = f.recover() >>> r <__main__.Job object at 0x7f14419af650> >>> f.store(s) >>> r, _ = f.recover() >>> r <__main__.Schedule object at 0x7f14419afe50> A file declared as polymorphic using the :ref:`Any` type, contains an instance of any registered class, such as ``Job`` or ``Schedule``. This ability avoids having to represent unrelated concepts - e.g. for ``Job`` and ``Schedule`` objects - within a single class. It's also extensible in that further object types can be added without disturbing much of the existing code. .. note:: The recovery of polymorphic representations also integrates seamlessly with version support. Refer to :ref:`versions-upgrading-and-migration`. How This Works -------------- Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials: .. literalinclude:: stored-by-class/maintenance_job_any.json :language: json The JSON ``value`` has changed from an `object` to a `list` that contains a `string`, an `object` and an empty `list`. The string appears to identify the type of the object. That is exactly the case. The first element is the name of the declared class in a form curated by the library and the second element is exactly what normally appears as the ``value``. This leads to the understanding of why a file created using ``ar.File('job', Job)`` cannot be recovered by a file declared using ``ar.File('job', ar.Any)``. A polymorphic "recover" operation cannot recover *anything*, it must be presented with materials created by a polymorphic "store". .. note:: The third element is an internal table required to track the movement of pointer materials. With no pointers in sight, the table is empty. .. _going-incognito: Going Incognito --------------- As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recovery operation the library `decompiles` that tag to the actual Python class object. Any failure to decompile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application. The library detects these scenarios, and during a recovery `folds` the materials into an :py:class:`~ansar.encode.message.Incognito` object. The outcome is that recovery operations of unknown types do not fail in the normal sense. They produce an instance of a special class:: def work_on_job(j): if isinstance(j, Job): # Process the one-off job. return True elif isinstance(j, Schedule): # Process repeating job. return True elif isinstance(j, ar.Incognito): log(j.type_name) return False software_error() This implementation of ``work_on_job`` logs the name of any unregistered class. If the job is not an instance of anything tested for by the function, it calls the ``software_error`` function. The class of the job was known within the application (i.e. it has been registered) but the function has not yet been updated to perform the related work. .. note:: For those who are curious or are needing explanation for something they have observed, the :py:class:`~ansar.encode.message.Incognito` object never appears in stored materials. The object is *un-folded* during the store process in a manner matching the prior *folding*. Effectively this allows representations of unknown type to pass through the application without change.