.. _design-goals-and-best-practise: Design Goals And Best Practice ****************************** General information about design goals and what constitutes good usage of the library, appears here. .. _the-single-most-important-thing: The Single Most Important Thing =============================== This library works with true application types. Store operations place representations of operational objects in system files and recovery operations result in objects ready for action. Using this library to read a temporary object that is then transferred into operational objects in a member-by-member fashion, is perhaps consistent with the use of other libraries. However, there are several reasons to *not* use this library in that way. Most importantly it creates an additional translation layer, between the operational objects and the library. This undoes a lot of hard work to locate all such translations *inside the library*. Whenever the objects handled by the library change (e.g. moves, adds and deletes of members), any translation layer is likely to require an update. Across a substantial application that represents a *self-inflicted, maintenance headache*. There are a few exceptions to this rule, e.g. applications working with non-UTC timezones, but generally the presence of pre-store or post-recover translations is either a misuse of the library or due to a limitation of the library. The maintainer is automatically interested in any instance of the latter. .. _dates-times-and-zones: Dates, Times And Zones ====================== The types associated with time values appear below; +---------------+----------------+----------------------------------------------+ | Python | Ansar | Notes | +===============+================+==============================================+ | ``float`` | ``ClockTime`` | An *epoch* time value. | +---------------+----------------+----------------------------------------------+ | ``float`` | ``TimeSpan`` | The difference between two ``ClockTimes``. | +---------------+----------------+----------------------------------------------+ | ``datetime`` | ``WorldTime`` | A formal date and time value. | +---------------+----------------+----------------------------------------------+ | ``timedelta`` | ``TimeDelta`` | The difference between two ``WorldTimes``. | +---------------+----------------+----------------------------------------------+ The library supports the two styles of time values; ``float`` values that record the number of seconds since an epoch (probably January 1, 1970) and ``datetime`` objects that hold explicit year, month (etc) values. In general applications will use ``datetime`` and ``timedelta`` values but the epoch-associated ``float`` capabilities are retained for those specific scenarios where the full complexities of daylight saving, war-time adjustments and leap seconds, can be avoided. The concept of timezones manifests as the ``tzinfo`` parameter and attribute of the ``datetime`` class. The associated ``tzinfo`` class is an abstract base class with multiple concrete implementations. The standard library provides the minimal implementation as ``datetime.timezone`` but there are several others including ``dateutil.tz.tzfile`` and ``zoneinfo.ZoneInfo``. These different implementations provide a variety of options and behaviours such as intepretation of IANA names, Windows zone names and GNU TZ values. .. note:: There are significant differences in the efficacy of the different implementations, which might be construed as either limitations or bugs (there are definitely bugs). Suffice to say, time processing involving timezones is truly complex. As a general rule the implementations have improved over time, i.e. from ``datetime``, through ``pytz``, ``dateutil`` and most recently, ``zoneinfo``. There is no mechanism by which the library can support the encoding of every concrete implementation of the ``tzinfo`` base class. For technical reasons beyond the scope of this document it is not currently possible. To provide timezone capability, the library allows instances of ``datetime.timezone`` for the ``tzinfo`` attribute. Assigning a value of any other type will result in the raising of an exception during encoding. Of course, a value of ``None`` is accepted, i.e. "naive" ``datetimes``. Applications required to manage ``datetime`` objects with a variety of timezones, say selected by a user from the set of IANA names, must implement *pre-store* and *post-recover* code. In addition, the selected IANA names must be stored independently of the ``datetime`` objects. Pre-store code will convert all the ``datetime`` objects to the UTC timezone using methods such as ``datetime.astimezone``. Note that the UTC timezone is available as ``datetime.timezone.utc``, or ``ar.UTC`` (with the appropriate ``import`` statements): .. code-block:: python # Storage location. # Assume a Meeting class that stores a datetime and IANA name. meeting_file = File('meetings', ar.Vector(Meeting)) meetings = [ Meeting(.., 'Australia/Broken_Hill'), Meeting(.., 'Pacific/Marquesas'), Meeting(.., 'Europe/Sofia'), .. ] .. # Pre-store. as_utc = [Meeting(m.when.astimezone(ar.UTC), m.iana) for m in meetings] # Actual store. meeting_file.store(as_utc) Post-recover code will convert all the ``datetime`` objects back to their original timezone: .. code-block:: python # Actual recover. as_utc, _ = meeting_file.recover() # Post-recover # Uses the dateutil.tz.gettz function to convert the IANA name # into a tzinfo object. meetings = [Meeting(m.when.astimezone(gettz(m.iana)), m.iana) for m in as_utc] This departure from the standard encode-decode cycle is unfortunate, but a direct consequence of the design of the ``datetime`` module. A ``WorldTime`` conforms to ISO 8601 and looks like this:: 2008-07-01T19:01:37 A ``ClockTime`` uses the same representation:: 2010-06-06T06:46:01.866233 Recovering a ``ClockTime`` produces a ``float`` value appropriate to the host system time. A ``ClockTime`` implements the concept of a particular time on a particular calendar day - that being a different ``float`` value for every timezone in the world. A ``TimeSpan`` is a relative value or *delta*, the difference between two ``ClockTime`` values. A few example representations appear below: * ``1h2m3s`` * ``8h`` * ``10m`` * ``0.0125s`` * ``1d`` These are readable text representations of *float* values. After recovery the resulting value can be added to any Python time value and that will have the expected effect, i.e. adding a recovered ``TimeSpan`` of ``2m`` will add ``120.0`` to the Python time value (noting that true time intervals are not nearly as simple as two time values and a bit of arithmetic):: >>> t = ar.text_to_clock('2012-03-06T19:00:30') >>> t 1331013630.0 >>> d = ar.text_to_span('2m') >>> d 120.0 >>> t += d >>> ar.clock_to_text(t) '2012-03-06T19:02:30' Functions such as ``text_to_clock``, ``text_to_span`` and ``clock_to_text`` are the functions used internally, during store and recover operations. .. _pushing-back-against-complexity: Pushing Back Against Complexity =============================== The technical domain referred to as application persistence or serialization has many pitfalls and caveats. One of the background concerns is the seemingly unstoppable increase in code complexity - the longer an application exists and the more it immerses itself in serialization the more complex it can get. There are reasons to suggest that this is more than the general increase in complexity that manifests in all evolving software. Having to deal with *change* (i.e. version management) is one such reason. Guaranteed Structure -------------------- One way to push back against complexity is to aim for *guaranteed structure* wherever possible and only retreat from that position in an explicit fashion. What this means is that wherever there is a sequence, collection or nested object, there is benefit in preventing assignment of ``None``. In the case of the ``who`` member of the ``Job`` object (from :ref:`examples` appearing throughout this documentation), this avoids the pre-check needed before every access to that member, e.g. ``len(j.who)`` will throw a ``TypeError`` if ``who`` is currently set to ``None``. This might seem trivial in isolation but it affects every use of that member throughout a codebase. Multiplying that by every "structural" member and you have a real contributor to overall complexity. The declaration of ``'who': ar.VectorOf(ar.Unicode),`` is an expression of "guaranteed structure" and must be backed up by code such as ``self.who = who or ar.default_vector()`` in the ``__init__`` method. Both the encoding and decoding processes will raise a ``ValueError`` exception if a ``None`` value is detected where structure is expected. The downside to "guaranteed structure" is that ``None`` is the standard way of representing "not set". This works nicely for non-structured types such as numbers and UUIDs but creates a conflict when dealing with types such as vectors. The desire to reduce coding complexity must compete with any hard requirements that would normally be implemented using ``None``. Optional Structure ------------------ The declaration of ``'who': ar.PointerTo(ar.VectorOf(ar.Unicode)),`` is an expression of *optional structure* and it allows the value ``None`` to flow through encoding and decoding processes successfully. There is no actual difference at coding level, i.e. ``who = j.who`` will still assign the member value to the variable, and that value will either be a Python list or ``None``. Further information on the :ref:`PointerTo` facility can be found :ref:`here`. .. _three-class-exemplars: Three Class Exemplars ===================== This section provides three different styles of declaration and initialization of application types. The different styles are a response to what developers might wish to do, versus what is possible within the capabilities of the library. .. _no-schema-required: No Schema Required ------------------ The following class demonstrates the extent of what can be achieved without schemas. There is a member for every type that can be automatically inferred during the registration process. Several library types resolve to the same Python type. For this reason, library types such as ``ClockTime`` do not make an appearance as the Python ``float`` type is assumed to be a floating point, mathematical value, i.e. a ``Float8``. All the structured types - such as vectors and sets - require explicit type declaration, and therefore cannot appear within this style of application type. .. literalinclude:: class-exemplar/no_schema_required.py :language: python All members *must* be initialized to an instance of the proper Python type. A representation of the default instance looks like: .. literalinclude:: class-exemplar/no_schema_required.json :language: python .. _default-values-required: Default Values Required ----------------------- Declaration of an application type should start with something like below, and evolve towards the style appearing in the next section, as necessary. There is a member for nearly every supported type including the structured types. A few types that are internal to the library are omitted for clarity: .. literalinclude:: class-exemplar/default_values_required.py :language: python All non-structural members are initialized with ``None``, including types such as times, enumerations and pointers. Structural members *must* be initialized with a sensible default value, most conveniently provided by one of the library functions. All ``None`` values are, of course, omitted from the representation except where structure dictates that a slot must exist, i.e. in an array. After loading of a ``DefaultValuesRequired`` instance it is guaranteed that ``dvr.an_array[7]`` will exist, i.e. the expression will not result in a ``TypeError`` (where ``an_array`` is ``None``) or an ``IndexError`` (where ``an_array`` is too short): .. literalinclude:: class-exemplar/default_values_required.json :language: python .. note:: The :py:func:`~ansar.encode.message.make_self` library function can be used to automate the construction of complex objects. Refer to :ref:`here` for an example. .. _structure-as-optionals: Structure As Optionals ---------------------- Where it is desirable that structural members may be "not set", the ``PointerTo`` library type is used to relax the guarantees of the previous section: .. literalinclude:: class-exemplar/structure_as_optionals.py :language: python All members are initialized to ``None`` and this results in the default representation as the empty JSON object: .. literalinclude:: class-exemplar/structure_as_optionals.json :language: python .. _an-assessment-of-versioning: An Assessment Of Versioning =========================== Versioning within the field of software engineering is not a science. There are certainly successful implementations of versioning but they solve different problems (document vs network API versioning) and use different technologies and "standards" (e.g. semantic vs calendar version tags). The Ansar library automates the version stamping and version detection aspects of versioning. It is up to the application to exploit the version information supplied at every recovery site, to implement *version support*. And all of this must exist within a world where developers may be pre-committed to their own flavour of versioning. This library acknowledges three different approaches to versioning and includes features and behaviours that attempt to integrate, tolerate or (gracefully) reject all of them. 1) It can be essentially ignored, or 2) it can be tackled in a lightweight, informal manner, or 3) it can be under full, explicit control. Each approach may be appropriate to a situation. Figuring out the best approach is mostly about accepting a related level of functional failures when loading "older" encodings. Production environments will obviously lean towards full control with the goal of seamless behaviour in those circumstances. This library supports the third approach to versioning. But before diving into the coding of any version support, there are two further concepts that should be clarified. One concerns exactly *what* should be versioned (e.g. application types) and the other is about ensuring complete and effective version tags. Proper explanation can be found at the :ref:`end of this assessment`. Ignore It All And Carry On -------------------------- In this approach, application types can be modified at will. Applications simply accept the failures and other sub-optimal behaviours that can arise when attempting to load older encodings. This can be appropriate during prototyping work, in informal operational environments and where the loss of older encodings is acceptable or recoverable. Where an application is incapable of loading an encoding in the normal manner it may be possible to manually modify the encoding (i.e. just "fix it"!) using an editor. The appeal to this approach is code simplicity. Application types and the code around those types always reflects the latest operational model, and *only* that latest model. The downsides to this approach are significant. Attempts to load older encodings can produce a variety of undesirable behaviours including complete rejection or successful loading but with incomplete content, or unexpected values. The library is somewhat tolerant of mismatches between inbound encodings and the current definitions of application types. Where an inbound array is too long the surplus elements are discarded and where the array is too short the library appends default values. In addition, members that didnt exist in older encodings assume default values. Perhaps most importantly, this approach can leave an operational site (e.g. a cloud deployment, a user's desktop, or a remote monitoring station) compromised and with no automated path back to an operational status. Boxing Clever ------------- The second approach is a common default in that it "just evolves". Without deliberate thought and without support from the development tooling, this is the style of version management that most often results. There is a more considered approach to the changing of application types. Members are added and deleted, but in the latter case they are not necessarily removed from the type declaration. For the first time the membership of applications types is an accumulation of past and present members. On storing of an application type, the latest code may omit the initialization of older members. On recovery the code may use the absence of older members to infer the version of the contents. With care and discipline this approach can have some value. It is "lightweight" in that no additional declaration or tooling is required. Decoding operations have more scope to deal with past and present materials because the names and types of older members are still recorded inside the application types. It is possible to load both old and new encodings without any problems. On the downside the lack of formality means that there are no offical *version tags*, e.g. the lack of a member value might indicate an older version but there is no unique identity that can be used in codepath selections. It's also true that this style of coding does not scale well. It becomes harder and harder to "add another version" and almost impossible to "remove the oldest version". What seemed like a pragmatic decision in the earlier part of an application's lifetime, can be exposed as shortsighted when the application goes on to enjoy commercial success and wider adoption. This approach may suit a circumstance where there are only a few application types involved (perhaps just the one document type) and/or the types are unlikely to experience much change. Explicit Version Histories -------------------------- The third approach is where a version tag travels alongside every instance of encoded application data. The data is "stamped" with a tag during the creation of a portable representation. The tag is subsequently extracted during decoding and made available to the loading application. The application is expected to use the tag for codepath selection and this forms the basis of all version support work. Application types are again an accumulation of past and present members but now there are unique identities associated with each step in the evolution of an application type. Versioning is now explicit rather than inferred from the set of members presented in an encoding. Encodings are improved in that only those members appropriate to a version are included. This is an explicit variation of the behaviour in the previous style of versioning, where members with a ``None`` value were omitted. This approach goes even further; it looks for members that should be omitted and checks that they do indeed have a ``None`` value. Any value other than ``None`` causes an exception. When a versioned application encounters an encoding created by an unversioned application a special tag is synthesized to distinguish these encodings. Rather than the normal ``"1.7"`` style of tag the loading application will receive the empty string, i.e. ``""``. This value can be used in codepath selection to implement seamless version support across versioned *and* unversioned operation. Shifting to version management does not prevent the application from loading older encodings. .. _what-to-tag: What To Tag ----------- This library implements the version tagging of application types, i.e. registered types such as ``Job`` are assigned a version history. Each entry in a version history has a tag and this is the value that emerges as the version tag on decoding operations such as :py:meth:`~ansar.encode.file.File.recover`. This works great with code such as the following: .. code-block:: python f = ar.File('job', Job) j = Job() f.store(j) .. r, v = f.recover() The value ``v`` will contain the version tag from the latest entry in the version history of ``Job``. Now consider this different use of a ``File``: .. code-block:: python f = ar.File('job', ar.VectorOf(Job)) This highlights the fact that a ``File`` works with a *type expression*. The first code example works because registered application types are instances of type expressions. This is great but also begs a question; what is the content of ``v`` after complex data (e.g. a vector of ``Jobs``) is recovered? For the purposes of stamping outbound encodings and comparison of inbound encodings, the library evaluates the associated type expression and derives the *effective type*. It is the effective type that is used for all version-related activites. Evaluation involves a traversing of the type expression, as if it were an upside-down tree of branches and leaves. It is looking for the lowest, non-structural type, or *terminal leaf*. The ``Job`` expression and the ``VectorOf(Job)`` expression evaluate to the same effective type, i.e. ``Job``. This is because evaluation of ``VectorOf`` returns the type of its content. It does this in a recursive manner, where ``VectorOf(VectorOf(Job))`` will still evaluate to the ``Job`` class for all version-related activities. Evaluation behaves in a similar manner for all the containers and sequences, e.g. the effective type for ``DequeOf(Job)`` is ``Job``. Again, this is a recursive behaviour so that the effective type for ``DequeOf(VectorOf(Job))`` is still ``Job``. Where the terminal leaf is not an application type, such as with ``VectorOf(str)``, evaluation returns a ``None`` and versioning is effectively disabled for that type expression. There is a twist in the evaluation of an associative array or ``MapOf``. While ``MapOf(str, Job)`` evaluates to ``Job`` this also means that ``MapOf(CompositeKey, Job)`` (where ``CompositeKey`` is a hashable and registered application type) raises a problem. Changes to ``CompositeKey`` will not be reflected in the version information returned by a ``recover()`` operation. A similar issue exists with ``ArrayOf(Job, 8)``. Changes to the dimension will not be reflected in the versioning. This is a weakness in the "effective type" approach to versioning. Other approaches were considered but the intellectual overhead inflicted on the developer and the additional costs during encoding and decoding could not be justified. The workaround is to place the associative array or fixed-size array inside an application type. .. _versioning-of-a-complex-type: Versioning Of A Complex Type ---------------------------- Consider the following application types: .. literalinclude:: class-and-store/reachable_types.py :language: python A ``Document`` contains a title and zero or more ``Section`` objects. Both application types are using full schemas and version histories - version management is active. During decoding operations, version support will be based on ``Document`` (i.e. it is the effective type): .. code-block:: python f = ar.File(open_file.file_name, Document) .. r, v = f.recover() The ``v`` value will contain either ``"0.0"``, ``""`` or ``None`` depending on the scenario, and will be used for codepath selection. As the ``Document`` history changes, codepath selection is extended to include further values of ``v``. Schemas for an application type like ``Document`` can refer to zero or more application types. They may appear as the type of a member or within complex structural declarations, e.g. ``VectorOf(Section)``. The top level type is known as a *document* and any types referred to in the document schema are known as *reachables*. The process that determines the reachable types for a given document is recursive, i.e. it is an accumulation of all the reachables found in the document schema *and* all their reachables, etc. A real difficulty arises when the ``Section`` type changes. The version history is updated but at runtime, right when it would be expected, there is no change to the value of ``v``; version management machinery is still focused on ``Document``. To resolve this situation there needs to be a check between the previous configuration of version histories and the latest configuration. And to do that there needs to be a record of what the previous configuration was. Applications must maintain a module that registers those application types (i.e. documents) that need full version management. It will import the ``Ansar`` library and the application modules declaring the messages to be registered. The module will look like this; .. code-block:: python import ansar.encode as ar import document import settings ar.released_document(document.Document) ar.released_document(settings.Settings) This code scans the specified types, compiling sets of reachable types and their respective versions. There is a utility provided by the library that can load this same module and compare the compiled information to a previously saved image of the same information. The following command line compares the current version information with the information saved in the named file; .. code-block:: $ ansar-releasing -a check src/module.py application.release The ``-a`` flag requests a listing of all the issues detected. The contents of the ``application.release`` file look like this; .. code-block:: $ cat application.release document.Document:document.Document/0.2,document.Section/0.7,document.Paragraph/0.3,document.TableOfContents/0.9,document.Index/0.8 settings.Settings:settings.Settings/0.11,settings.Device/0.14,settings.NetworkLocation/0.3 There are four possible outcomes from ``ansar-releasing``; 1. The release file does not exist '''''''''''''''''''''''''''''''''' There has been no previous release. The current version information is used to create a new file and the utility returns a success code. 2. The release file exists and version information has not changed '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Current version information is compared to the saved image and no difference can be detected. The utility returns a success code. 3. The saved file exists and there have been valid changes to version information ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Current version information is compared to the saved image and differences are detected. The set of differences are determined to be valid. The utility returns a success code. 4. The saved file file exists and there have been incomplete changes to version information ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Current version information is compared to the saved image and differences are detected. The set of differences are determined to be incomplete. The current software should not be deployed into production environments as the potential to write successful version support is compromised. The utility prints details of what it found and returns an error code. Most often the resolution is to add a line to a document history (as specified by the utility) and repeat the build process. Without the influence of ``ansar-releasing`` a sequence of version changes might look like this: +---------------------+-----+-----+-----+-----+-----+ | Type | A | B | C | D | E | +=====================+=====+=====+=====+=====+=====+ | ``Document`` | 2.5 | 2.5 | 2.5 | 2.5 | 2.6 | +---------------------+-----+-----+-----+-----+-----+ | ``Section`` | 1.7 | 1.8 | 1.8 | 1.8 | 1.8 | +---------------------+-----+-----+-----+-----+-----+ | ``TableOfContents`` | 2.3 | 2.3 | 2.3 | 2.4 | 2.4 | +---------------------+-----+-----+-----+-----+-----+ | ``Index`` | 1.8 | 1.8 | 1.9 | 1.9 | 1.9 | +---------------------+-----+-----+-----+-----+-----+ Release **B** includes a change to ``Section``, **C** to ``Index`` and **D** to the ``TableOfContents``. Through these changes to reachables, the version of ``Document`` remains constant. Release **E** records a change to the document itself. For applications recovering instances of a ``Document``, there is no detectable difference between releases **A** through to **D**. With the checks imposed by ``ansar-releasing`` it changes to this: +---------------------+-----+-----+-----+-----+-----+ | Type | A | B | C | D | E | +=====================+=====+=====+=====+=====+=====+ | ``Document`` | 2.5 | 2.6 | 2.7 | 2.8 | 2.9 | +---------------------+-----+-----+-----+-----+-----+ | ``Section`` | 1.7 | 1.8 | 1.8 | 1.8 | 1.8 | +---------------------+-----+-----+-----+-----+-----+ | ``TableOfContents`` | 2.3 | 2.3 | 2.3 | 2.4 | 2.4 | +---------------------+-----+-----+-----+-----+-----+ | ``Index`` | 1.8 | 1.8 | 1.9 | 1.9 | 1.9 | +---------------------+-----+-----+-----+-----+-----+ Now the version of ``Document`` also changes with every change to one of its reachables. A third number can be appended to the version tag and used to explicitly represent changes to reachables rather than a change to the type that the tag belongs to. +---------------------+-----+-------+-------+-------+-----+ | Type | A | B | C | D | E | +=====================+=====+=======+=======+=======+=====+ | ``Document`` | 2.5 | 2.5.1 | 2.5.2 | 2.5.3 | 2.6 | +---------------------+-----+-------+-------+-------+-----+ | ``Section`` | 1.7 | 1.8 | 1.8 | 1.8 | 1.8 | +---------------------+-----+-------+-------+-------+-----+ | ``TableOfContents`` | 2.3 | 2.3 | 2.3 | 2.4 | 2.4 | +---------------------+-----+-------+-------+-------+-----+ | ``Index`` | 1.8 | 1.8 | 1.9 | 1.9 | 1.9 | +---------------------+-----+-------+-------+-------+-----+ Adopting this numbering option produces tags that are visibly different for changes to reachables, i.e. the tags are longer and the major-minor combination for ``Document`` remains unchanged through releases **B**, **C** and **D**. It also helps to keep minor numbers small. The first number in the sequence is always ``1`` to be distinct from the default value of zero assigned to a short-form tag, i.e. for comparison purposes ``2.5`` is equivalent to ``2.5.0``. Once the checks are satisfied, the final step in a build process is to overwrite the previous version information with the latest. The following command line makes that happen; .. code-block:: $ ansar-releasing set src/module.py application.release In practice the ``ansar-releasing`` command is used as part of an automated build process and the ``application.release`` file is added to the source repository. This establishes a development environment with proper version management. Application developers can be assured that there are distinct versions of documents, as needed for proper implementation of version support.