.. _versions-upgrading-and-migration: Versions, Upgrading And Migration ********************************* Implementing version support involves an additional parameter during type registration (i.e. a ``version_history`` is passed to ``bind()``) and there is a formal registration of those types to be passed to an encode operation (e.g. the ``Job`` class is passed to :py:meth:`~ansar.encode.release.released_document`). Lastly, the version tags provided at recovery sites must be used for codepath selection, which is where the actual "support" begins. There are two reasons for the :py:meth:`~ansar.encode.release.released_document` registration. Firstly, not all application types will be the focus of an encode operation and the requirements imposed on version-managed types are rigorous. Imposing these requirements on all types would be heavy-handed and tricky where types are shared by multiple applications. Secondly, there is a need to maintain integrity of version-managed types that contain *nested* application types. An application document type may contain further types for sections, indexes and spelling dictionaries. What happens when one of those nested types changes and the document does not? Without :ref:`extra attention` there would be situations where the document changes but this is not detected during recovery and there is no potential for the application to respond in a proper fashion. The versioning solution offered by this library exists in a world of software development that has evolved many different approaches. For the context of this solution refer to this :ref:`assessment` of the situation. The remainder of this section is focused on full version control. In The Beginning ================ This is a repeat of the journey described in :ref:`more-about-types` except this time the changes made to ``Job`` will be tracked using version managment. Consider the original declaration: .. literalinclude:: class-and-store/received_job_basic.py :language: python The stored representation looks like: .. literalinclude:: class-and-store/received_job_basic.json :language: json There is no evidence of anything suggesting version managment - yet. This absence of any versioning detail is deliberate and the proper representation of an unversioned message. All encoding-decoding activity associated with the ``Job`` type at this point is referred to as "unversioned". Consider a store-and-recover cycle of a default ``Job``:: >>> f = ar.File('job', Job) >>> j = Job() >>> f.store(j) >>> r, v = f.recover() >>> print(r) <__main__.Job object at 0x7fc300dd7150> >>> print(v) None The :py:meth:`~ansar.encode.file.File.recover` method has returned ``None`` as the version tag. This is the result of a preprocessing of the version situation. It tells the application that the recovered object is at the same version as the reading application. In this case it actually means that both the encoding and decoding operations were *unversioned*. This preprocessing simplifies the code around the decoding site as there is no need to compare against an explicit value. A ``None`` version tag always indicates that no further version-related processing is needed. Everything Changes ================== The first change to ``Job`` was to record the creation time. This change, along with a `version history` is shown here: .. literalinclude:: class-and-store/received_job_created_version.py :language: python The simple presence of a version history *activates version management*. All subsequent encodings of jobs have a version tag added to the representation. The tag used comes from the end of the ``JOB_HISTORY``. This is what happens when that application recovers the stored job created previously:: >>> f = ar.File('job', Job) >>> r, v = f.recover() >>> print(r) <__main__.Job object at 0x7f3dd9197b10> >>> print('"%s"' % (v,)) "" The library is notifying the application that the stored job is at a different version to the version in use by the application. It is returning the empty string - a *special tag* indicating that the recovered job is unversioned. Storing a representation of the latest ``Job`` declaration produces the following: .. literalinclude:: class-and-store/received_job_created_version.json :language: json Recovering this latest, versioned job results in a ``None`` version tag, indicating that the encoded materials and the decoding application are at the same version. Application Type Histories ========================== A history is a sequence of changes, where each change looks like this: [`tag`, `operation(s)`, `note`], A `tag` is a string like ``0.14`` or ``2.4`` that includes a major and minor number separated by a dot. The minor number is assumed to increment on each subsequent change except where the major number increments and the minor number returns to zero. The second column describes the significant operations in a "machine-readable" form. Accepted operations are ``Added``, ``Moved`` and ``Deleted``. These are classes defined in the library that accept one or two names as arguments. Any name ``Added`` is considered unavailable to the application before the associated version tag and any name ``Deleted`` is unavailable from the associated tag onward. ``Moved`` operations are a shorthand for a delete-add pair. An `operation` of ``None`` can be used to represent change in the software that processes the application data, rather than the application data itself. An `operation` can also be a list of operation objects. A `note` is a brief explanation of the change. Maintaining A Good History -------------------------- Type histories are intended to capture a *sliding window* of changes. As the history becomes long and the older entries become less relevant it will become appropriate to delete them. Whenever a line of history is deleted, any names appearing in ``Deleted`` operations can also be removed from the class declaration. This includes the ``previous`` name on a ``Moved`` operation. A name cannot be ``Deleted`` and then ``Added`` within the known history, as this would imply that the same name can be used with (potentially) different types. Values that need to change type are ``Moved`` to a new name and type. Names *can* be ``Added`` and then ``Deleted`` within the known history. All names appearing in the changes must also appear in the current class declaration, whether that name was ``Added``, ``Moved`` or ``Deleted``. Whether a name is actually available is dependent on a runtime version tag - a value provided at each decoding point (e.g. ``d, v = f.recover()``), or taken from the current type history at each encoding point (e.g. the default behaviour of ``f.store()``), or passed explicitly from the application to the encoding operation. The latter gives an application the ability to save old versions of documents, e.g. at the request of a user. The class may contain members that do not appear in the changes, e.g. when the history no longer includes the entry that ``Added`` the relevant name. Quality Encodings From Bags Of Data ----------------------------------- Application types are necessarily an accumulation of members. There are members that are no longer used by the latest code. There are also members that have only just been added - older code does not reference these members as they do not exist. Or rather, *didn't* exist at the time the older code was written. Supporting multiple versions of application types requires this accumulation of members. Without the continued presence of those "deleted" members there is no potential to decode older encodings. The decoding process would have nowhere to store the inbound values and old code would be referencing members that no longer exist. Application types become bags of past and present members. In versioned operation the library emits encodings containing only those members currently available, i.e. ``Deleted`` members are not included. The encodings are an expression of what the application type would look like if it wasnt carrying historical baggage. This is not only more "correct" it also results in encodings that are smaller and more readable. In unversioned operation everything that appears in an application type is reflected in encodings (i.e. except members with ``None`` values). To assist with the detection of programming errors, the library checks the value of any trimmed member. Any value other than ``None`` is considered a programming error and an exception is raised. Conceptually the code is trying to make use of a member that would not have existed at the relevant time. Full implementation of member trimming requires that all sub-objects are also trimmed, i.e. a document might contain sections, paragraphs, a table of contents and an index. There are a collection of application types that are aggregated to create the top-level document. .. note:: Application types that include further nested types are difficult for version management to track. Refer to :ref:`here` for what is required to properly track complex types. A Typical Change ================ Making one more change demonstrates a more typical change, i.e. a change not involving unversioned materials. A list of email addresses is added below: .. literalinclude:: class-and-store/received_job_who_version.py :language: python This produces the stored representation: .. literalinclude:: class-and-store/received_job_who_version.json :language: json The representation reflects the updated version history. Given the following set of test files: * ``job``, an instance of the original class * ``job-created``, an instance with the ``created`` member added * ``job-who``, an instance with the ``who`` member added >>> f = ar.File('job', Job) >>> r, v = f.recover() >>> print(r) <__main__.Job object at 0x7fc3ef7884d0> >>> print('"%s"' % (v,)) "" >>> f = ar.File('job-created', Job) >>> r, v = f.recover() >>> print(v) 0.0 >>> f = ar.File('job-who', Job) >>> r, v = f.recover() >>> print(v) None The last recovered version tag is again ``None``, reflecting that fact that the stored version and the application version are the same. Dropping Old Versions --------------------- During the recovery of an encoding, the library extracts the version information and compares it to the current application version information. If the minor number in the version tag is older than the oldest value in the application version history, the recovery process rejects the input and raises an exception. The encoding is considered to be *unsupported*. This acknowledges that the representation appears valid in all other ways but the executing application is no longer maintaining that area of code. .. code-block:: python :emphasize-lines: 2 JOB_HISTORY = ( ('0.0', ar.Added('created'), 'Added timestamp'), ('0.1', ar.Added('who'), 'Added the list of email addresses'), ('0.2', ar.Added('unique_id'), 'Added accounting'), ('0.3', ar.Deleted('priority'), 'Deleted the priority number'), ('0.4', ar.Added('permissions'), 'Added authorization values'), ) By deleting the first line highlighted above, the ``"0.0"`` version immediately becomes unsupported. Any encounter with encodings at this version will result in exceptions. At the same time all code specific to this version can be retired from the application. When housekeeping work finally catches up with the ``"0.3"`` version and the relevant version-description pair is deleted from the history, the ``priority`` parameter and member can also be deleted from the ``Job`` class. Any attempt to recover encodings at that version (or before) will result in a version exception. A Brave New World ----------------- The major version number is used to signal a complete reset. The major number is incremented by one and the minor number returns to zero. The version history contains the lone entry: .. code-block:: python JOB_HISTORY = ( ('1.0', None, 'Brave new world'), ) The class is now effectively at an initial version. Recovery of any encoding tagged with the previous major number - ``"0.24"`` - results in a rejection by the library. It is considered *inappropriate* to distinguish it from *unsupported*. An exception is raised. Moving to a new major number is likely to reflect significant technical changes in the application - a shift to new tools and/or architecture. Perhaps a re-targeting from customer premises deployment to the cloud. There may be commercial considerations involved. For whatever reason the application wants to continue using the name in the class declaration (e.g. ``Job``) but it is starting a new ecosystem of software and stored representations, and is not offering any integration with the previous ecosystem. Resuming The Journey ==================== Actual version support begins with the object and version tag returned by one of the two ``recover()`` methods. This section looks at how an application might respond to these values. The goal is seamless operation in a mixed-version world, but there are at least two different ways that this can be achieved. A First Attempt --------------- The ``Job`` class has been through 2 changes, giving a total of three versions (including unversioned encodings) that might be encountered by an application tasked with processing these objects: .. code-block:: python JOB_HISTORY = ( ('0.0', ar.Added('created'), 'Added timestamp'), ('0.1', ar.Added('who'), 'Added the list of email addresses'), ) Full support implies three different codepaths: .. code-block:: python j, v = f.recover() if v is None: # 0.1 pass elif v == '0.0': j.who.append(DEFAULT_EMAIL) elif v == '': j.created = DEFAULT_CREATED j.who.append(DEFAULT_EMAIL) else: not_supported() A version value of ``None`` is ignored - no specific support processing is required. Otherwise a series of conditionals arranges for a patching of the job object, according to the detected version. Default values are assigned to those members that did not exist in the respective version of a ``Job``. If the version remains unrecognised by the application there is a call to an error routine. This implementation of version support meets the primary requirement, i.e. seamless processing of mixed versions. A tacit decision was made to promote or *upgrade* older versions to something matching the current version. The actual processing of the job can then begin without regard to the original version. A different coding style looks long-winded but brings an advantage: .. code-block:: python j, v = f.recover() if v is None: pass elif v == '0.0': j = Job(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) elif v == '': j = Job(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) else: not_supported() Where an upgrade is required an entirely new ``Job`` is constructed from the information provided by each older version, plus defaults as necessary. The result, courtesy of the ``__init__`` function, is a properly constructed and up-to-date instance of a ``Job``. A second issue with both of these coding styles is that they are *inline* and over time the application will accumulate more than one call to :py:meth:`~ansar.encode.file.File.recover`. An Upgrade Plan --------------- A more forward-thinking style of coding is to move all the version-related activities to a dedicated function and call it something sensible like ``upgrade``. This change in approach cleans up the call site: .. code-block:: python j, v = f.recover(upgrade=upgrade) It plays nice with the ``dict`` comprehensions associated with :py:class:`~ansar.encode.folder.Folder` objects: .. code-block:: python jobs = {k: j for k, j, _ in f.recover(upgrade=upgrade)} The version returned by ``recover()`` is always ``None`` and can be safely ignored as any related issue has been addressed by the ``upgrade`` function. The function looks like this: .. code-block:: python def upgrade(r, v): if v == '0.0': return Job(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) elif v == '': return Job(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=[DEFAULT_EMAIL]) ar.cannot_upgrade(r, v) The ``upgrade`` function is only called where appropriate, i.e. where the detected version is not equal to the current application version. This means there is no need to include ``None`` as one of the codepaths. This function either returns a new object using the contents of an older object, or it raises an exception. The library function :py:func:`~ansar.encode.version.cannot_upgrade` raises a ``ValueError`` with a helpful diagnostic. Injecting Runtime Values Into Upgrades -------------------------------------- ``DEFAULT_CREATED`` and ``DEFAULT_EMAIL`` are assigned to the relevant members when the recovered version lacks those particular values. In the previous code fragment these were hardcoded constants and there will likely be scenarios where runtime values are needed. The optimal approach is a matter of design and preference. A few suggestions follow. Values may be computed just prior to the recovery site, or even perhaps just prior to the upgrade: .. code-block:: python who = [get_who()] jobs = {} for k, j, v in f.recover(): hi = get_hi(j) lo = get_lo(j) j = upgrade(j, v, who=who, hi=hi, lo=lo) jobs[k] = j The implication is that ``get_hi`` and ``get_lo`` are values that will be based on other values present in each ``Job``. The ``who`` value does not share that dependency and can be calculated ahead of time prior to the ``for`` loop. These different runtime values are gathered together by the call to ``upgrade`` and become available for population of ``Job`` members. Another arrangement elicits a slightly different behaviour: .. code-block:: python who = None .. global who who = who or [get_who()] jobs = {} for k, j, v in f.recover(): hi = get_hi(j) lo = get_lo(j) j = upgrade(j, v, who=who, hi=hi, lo=lo) jobs[k] = j The ``get_who`` function is called on every visit to this code site, until it returns a non-``None`` value. Any runtime values that have no special connection to particulars of a ``recover`` site can be located with the ``upgrade`` function. Similar options exist with respect to placement of the different code elements: .. code-block:: python who = None def upgrade(r, v, hi=100, lo=10): global who who = who or [get_who()] if v == '0.0': return Job(created=j.created, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=who) elif v == '': return Job(created=DEFAULT_CREATED, unique_id=j.unique_id, title=j.title, priority=j.priority, service=j.service, body=j.body, who=who) ar.cannot_upgrade(r, v) Migration - Reducing The Upgrade Workload ----------------------------------------- The goal is seamless operation in a mixed-version world and the ``upgrade`` facility ticks that box. An application with a solid implementation of ``upgrade`` can focus its attentions entirely on the current version of ``Job``. Any application that is repeatedly upgrading the same stored representations is missing an opportunity to avoid the overhead of all but the initial upgrade. This can happen with an application configuration file, or a job scheduler polling a folder for work. Adding a single parameter arranges for the runtime migration of an older configuration file: .. code-block:: python j, v = f.recover(upgrade=upgrade, migrate=True) The same facility is available on ``dict`` comprehensions based on ``Folder`` objects: .. code-block:: python jobs = {k: j for k, j, _ in f.recover(upgrade=upgrade, migrate=True)} Design of the ``recover()`` methods and the ``update`` and ``migrate`` parameters was tilted towards convenient use of ``list`` and ``dict`` comprehensions. The price of that convenience is less freedom in how runtime data may be injected into the upgrade process. Where the method-based approach is too constrained, a small function provides another option: .. code-block:: python def migrate(f, upgrade, *args, **kwargs): r, v = f.recover() a = upgrade(r, v, *args, **kwargs) if id(a) != id(r): f.store(a) return a This exact function is available within the library - :py:func:`~ansar.encode.version.migrate`. Migration is performed on a ``File`` object rather than an instance of application data (e.g. a ``Job``) due to the need to ``store()`` any changes. The simplest possible use is therefore: .. code-block:: python f = ar.File('job', Job) j = ar.migrate(f, upgrade) Usage involving a ``Folder`` might look like: .. code-block:: python def get_jobs(spool): jobs = {} for f in spool.each(): j = ar.migrate(f, upgrade) k = spool.key(j) jobs[k] = j return jobs .. note:: Runtime values for the ``upgrade`` function have been omitted for clarity. The ``args`` and ``kwargs`` parameters allows the ``migrate`` function to forward runtime information on to ``upgrade`` where needed. Automated Migration ------------------- Software applying the ``migrate`` style of version support versus the ``upgrade`` style brings an *auto data migration* behaviour to every software release process. Wherever the new software goes, the files it works with are brought up-to-date with respect to the latest application types.