More About Types

This section looks more closely at what the library can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics.

A series of changes are made to the Job introduced in the previous section. These changes involve progressively more complex types. The full series introduces enough information to cover most application data requirements.

A Refresh Cycle

The starting point is the original declaration:

import uuid
import ansar.encode as ar

class Job(object):
    def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b''):
        self.unique_id = unique_id or uuid.uuid4()
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body

ar.bind(Job)

Members of the class are listed below along with their respective Python types:

  • unique_id - uuid.UUID

  • title - str

  • priority - int

  • service - str

  • body - bytes

The Job class is complete in that no further type information is required by the library. The library can store() and recover() far more complex application data, but to do so requires an entirely different approach to declaration of type information.

Remembering When

Recording the moment a job was presented is likely to be useful. Adding a created member looks like this:

import time
import uuid
import ansar.encode as ar

class Job(object):
    def __init__(self, created=None, unique_id=None,
        title='watchdog', priority=10, service='noop', body=b''):
        self.created = created or time.time()
        self.unique_id = unique_id or uuid.uuid4()
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body

ar.bind(Job)

This new class declaration is also complete and can immediately be put to work. New application features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing:

{
    "value": {
        "body": "",
        "created": 1666667754.7545347,
        "priority": 10,
        "service": "noop",
        "title": "watchdog",
        "unique_id": "bdcb0d8d-d5a2-4021-a182-5b57c53d59e8"
    }
}

The created member appears as a floating-point value. This might be functionally valid but the stored representation is useless as a time. It cannot be checked visually or modified with a text editor.

Passing additional type information at registration time fixes this issue:

ar.bind(Job, object_schema={'created': ar.ClockTime})

There is now an explicit description of the type of information that will be carried around in the created member. That description is being provided to the bind registration function in a schema and it obviates the need to assign a default value. In fact there are longer term considerations that mean it is best to use None as the default value, once a proper schema declaration is present.

Readability is restored in the file contents:

{
    "value": {
        "body": "",
        "created": "2022-10-25T16:15:54.78812",
        "priority": 10,
        "service": "noop",
        "title": "watchdog",
        "unique_id": "2cbc9901-d167-48dd-8275-0505ef093053"
    }
}

The Python float type is used for both mathematical purposes and for the marking of time. The library understands this overloading and provides a mechanism for users to be specific about their intentions. Note that it’s only the representation that changes and the application continues to use the float type. The library automatically performs the necessary conversions during encoding and decoding.

Note

Most applications working with times will use datetime runtime values rather than float. These are registered using the WorldTime type rather than ClockTime. ClockTime is retained for its convenience in certain situations. More detailed information relating to time values can be found here.

Declaring A Full Schema

A properly declared application type now looks like:

import ansar.encode as ar

class Job(object):
    def __init__(self, created=None, unique_id=None, title=None,
            priority=None, service=None, body=None):
        self.created = created
        self.unique_id = unique_id
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body

JOB_SCHEMA = {
    'created': ar.ClockTime,
    'unique_id': ar.UUID,
    'title': ar.Unicode,
    'priority': ar.Integer4,
    'service': ar.Unicode,
    'body': ar.String,
}

ar.bind(Job, object_schema=JOB_SCHEMA)

Each member of the application type is now associated with a library type, e.g. ar.ClockTime and ar.Integer4. An application type with a full schema enjoys a variety of benefits:

  • readability of explicit vs implicit type information,

  • ability to disambiguate in scenarios such as float vs time,

  • ability to describe more complex types,

  • and use of None as a default value.

A surprise awaits in the stored representation:

{
    "value": {}
}

The expected value object has been reduced to the empty object - where have created, unique_id and their siblings gone? The empty object is the direct consequence of an internal encoding rule - members with a value of None are omitted from stored representations. This is a simple optimization, reducing the size of resulting files. More importantly this is part of informal version management. Further information on versioning can be found here.

Names For Runtime Numbers

Where there are operational numbers that need readable representation an Enumeration is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this:

import ansar.encode as ar

ModeOfTransport = ar.Enumeration(CAR=1, TRUCK=2, MOTORCYCLE=3, BOAT=100, SCOOTER=10, SKATEBOARD=11)

class Job(object):
    def __init__(self, created=None, unique_id=None, title=None,
            priority=None, service=None, body=None,
            transport=None):
        self.created = created
        self.unique_id = unique_id
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body
        self.transport = transport or ModeOfTransport.MOTORCYCLE

JOB_SCHEMA = {
    'created': ar.ClockTime,
    'unique_id': ar.UUID,
    'title': ar.Unicode,
    'priority': ar.Integer4,
    'service': ar.Unicode,
    'body': ar.String,
    'transport': ModeOfTransport,
}

ar.bind(Job, object_schema=JOB_SCHEMA)

A simple example of usage appears below:

>>> f = ar.File('job', Job)
>>> j = Job()
>>> f.store(j)
>>>
>>> r, _ = f.recover()
>>> r.transport
3

Lastly, the stored representation looks like this:

{
    "value": {
        "transport": "MOTORCYCLE"
    }
}

Sequences And Collections

A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to configured parties. Each job needs a list of zero or more email addresses:

import ansar.encode as ar

class Job(object):
    def __init__(self, created=None, unique_id=None,
        title=None, priority=None,
        service=None, body=None, who=None):
        self.created = created
        self.unique_id = unique_id
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body
        self.who = who or ar.default_vector()

JOB_SCHEMA = {
    'created': ar.ClockTime,
    'unique_id': ar.UUID,
    'title': ar.Unicode,
    'priority': ar.Integer4,
    'service': ar.Unicode,
    'body': ar.String,
    'who': ar.VectorOf(ar.Unicode),
}

ar.bind(Job, object_schema=JOB_SCHEMA)

Given the updated job creation:

Job(created=ar.clock_now(),
    title='the quick',
    who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere'])

The resulting file content looks like this:

{
    "value": {
        "created": "2022-10-25T16:15:55.094565",
        "title": "the quick",
        "who": [
            "tom.pirate@black.ship",
            "gerard.diplomat@ivory.tower",
            "aswan.swami@nowhere.everywhere"
        ]
    }
}

A job now includes a list of email addresses. The list can be zero or more addresses but there is always a list present, i.e. the j.who member is always an instance of a Python list.

Note

A good example of an application type similar to Job, but with all the bells and whistles, can be found here. Use of the default_vector() library function guarantees a valid default value for the member, but also touches on a wider issue.

There is a small collection of functions such as clock_now(). These are shorthands that can make use of this library much cleaner. They can also be considered syntactic sugar that extend the library dependency into further areas of code.

The object_schema parameter is a dict of names and type expressions and the VectorOf is one example of a type expression. These are the full set of supported sequences and collections:

  • ArrayOf - a fixed-length sequence

  • VectorOf - a variable-length sequence

  • DequeOf - a double-ended queue

  • SetOf - a unique collection

  • MapOf - an associative array

The library makes all necessary conversions between portable representations and the Python type best suited to a particular concept, i.e. ArrayOf and VectorOf map to list, DequeOf to deque, SetOf to set and MapOf to dict. There is no explicit support for concepts such as a double-ended queue in JSON, or any other known encoding. For this reason most sequences and collections appear as lists in representations.

Example usage appears below:

'recent_work': ar.DequeOf(ar.UUID),
'times_square': ar.ArrayOf(ar.ArrayOf(ar.ClockTime,2),2),
'time_periods': ar.DequeOf(ar.TimeSpan),
'priority_queue': ar.MapOf(int, ar.VectorOf(Job)),

Given the previously listed type expressions and the following values:

recent_work=ar.deque([
        uuid.uuid4(),
        uuid.uuid4(),
        uuid.uuid4(),
        uuid.uuid4(),
    ]),
    times_square=[
        [ar.clock_now(),ar.clock_now()],
        [ar.clock_now(),ar.clock_now()]
    ],
    time_periods=ar.deque([
        ar.clock_span(hours=1,minutes=2,seconds=3),
        ar.clock_span(hours=8),
        ar.clock_span(minutes=10),
        ar.clock_span(seconds=0, milliseconds=12, microseconds=500),
        ar.clock_span(days=1),
    ]),
    priority_queue={
        100: [Job(title='a'), Job(title='b')],
        10: [],
        1: [Job(title='!')],
    }

The resulting JSON file contents will look like this;

{
    "value": {
        "priority_queue": [
            [
                100,
                [
                    {
                        "body": "",
                        "created": "2022-10-25T16:15:55.024847",
                        "priority": 10,
                        "service": "noop",
                        "title": "a",
                        "unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
                    },
                    {
                        "body": "",
                        "created": "2022-10-25T16:15:55.024848",
                        "priority": 10,
                        "service": "noop",
                        "title": "b",
                        "unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
                    }
                ]
            ],
            [
                10,
                []
            ],
            [
                1,
                [
                    {
                        "body": "",
                        "created": "2022-10-25T16:15:55.024849",
                        "priority": 10,
                        "service": "noop",
                        "title": "!",
                        "unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
                    }
                ]
            ]
        ],
        "recent_work": [
            "daa4978c-4b39-4cf3-a2bd-894de8e7a2c0",
            "3925bab4-04ae-4712-bbbf-2fca9da8bd8d",
            "605226d2-cb05-408e-a15c-926dae09db9e",
            "06b1875e-81dc-4cc3-847a-888c5c492e68"
        ],
        "time_periods": [
            "1h2m3s",
            "8h",
            "10m",
            "0.0125s",
            "1d"
        ],
        "times_square": [
            [
                "2022-10-25T16:15:55.024843",
                "2022-10-25T16:15:55.024843"
            ],
            [
                "2022-10-25T16:15:55.024843",
                "2022-10-25T16:15:55.024843"
            ]
        ]
    }
}

Those Fixed-Size Arrays

All the containers have convenient functions to help with the proper initialization of members inside registered classes:

That is - except for arrays. The fundamental reason for that difference is that default instances of vectors, sets, maps and deques are empty. Additional information is required (i.e. the type of the elements and the size) to construct the content of an array. The fact that the elements of the array could themselves involve further nested structure, hints at the true nature of the situation.

A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers:

>>> import ansar.encode as ar
>>> t = ar.ArrayOf(ar.ArrayOf(int,2),4)
>>> a = ar.make(t)
>>> a
[[None, None], [None, None], [None, None], [None, None]]

The make() function returns the simplest, conforming instance of the specified type. For the simple types such as int that value is always None. This behaviour is a deliberate part of the informal version management and can help applications identify what version of materials is being loaded.

A revised version of the Job declaration appears below:

import ansar.encode as ar

class Job(object):
    def __init__(self, created=None, unique_id=None,
        title=None, priority=None,
        service=None, body=None, who=None):
        self.created = created or ar.make(ar.ClockTime)
        self.unique_id = unique_id or ar.make(ar.UUID)
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body
        self.who = who or ar.make(ar.ArrayOf(ar.Unicode,2))

JOB_SCHEMA = {
    'created': ar.ClockTime,
    'unique_id': ar.UUID,
    'title': ar.Unicode,
    'priority': ar.Integer4,
    'service': ar.Unicode,
    'body': ar.String,
    'who': ar.ArrayOf(ar.Unicode,2),
}

ar.bind(Job, object_schema=JOB_SCHEMA)

The who member is now initialized with a fixed-size array consistent with the type information registered with the Job class;

{
    "value": {
        "who": [
            null,
            null
        ]
    }
}

This demonstrates the use of the make() function. The who member is now a fixed-size array of email addresses, perhaps supporting the notion of primary and secondary points of contact. In a default instance of a Job, both of these contacts are not set. For demonstration purposes, the make function was also used to initialize the created and unique_id members. As mentioned previously the default value for these types is None, resulting in their omission from the representation.

Automated Construction

If an object is extremely complex or there is little perceived benefit in manual, per-member initialization, the make_self() function exists to automate that process. It assumes that all members have been initialized to None and uses the supplied schema to complete the guaranteed structure rule.

A revised version of the Job declaration appears below:

import ansar.encode as ar

class Job(object):
    def __init__(self, created=None, unique_id=None,
        title=None, priority=None,
        service=None, body=None, who=None):
        self.created = created
        self.unique_id = unique_id
        self.title = title
        self.priority = priority
        self.service = service
        self.body = body
        self.who = who
        ar.make_self(self, JOB_SCHEMA)

JOB_SCHEMA = {
    'created': ar.ClockTime,
    'unique_id': ar.UUID,
    'title': ar.Unicode,
    'priority': ar.Integer4,
    'service': ar.Unicode,
    'body': ar.String,
    'who': ar.ArrayOf(ar.Unicode,2),
}

ar.bind(Job, object_schema=JOB_SCHEMA)

The output is the same as in the previous section, at the cost of additional CPU cycles during object construction;

{
    "value": {
        "who": [
            null,
            null
        ]
    }
}

Polymorphic Persistence

The library supports the concept of polymorphism through the Any type. Rather than declaring that a file contains a specific type:

>>> f = ar.File('job', Job)
>>> j = Job()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f75deaf0750>

A file can be declared in this way:

>>> f = ar.File('job', ar.Any)
>>> j = Job()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f75deaf0750>

To the application the behaviour has not changed. If a second class declaration is introduced:

import ansar.encode as ar

DOW_TYPE = ar.ArrayOf(bool, 7)

class Schedule(object):
        def __init__(self, area=None, tod=None, dow=None):
            self.area = area
            self.tod = tod or ar.default_vector()
            self.dow = dow or ar.make(DOW_TYPE)

SCHEDULE_SCHEMA = {
    'area': ar.Integer4,
    'tod': ar.VectorOf(ar.ClockTime),
    'dow': DOW_TYPE,
}

ar.bind(Schedule, object_schema=SCHEDULE_SCHEMA)

A fresh sequence of operations illustrates the new possibilities:

>>> f = ar.File('job', ar.Any)
>>> j = Job()
>>> s = Schedule()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f14419af650>
>>> f.store(s)
>>> r, _ = f.recover()
>>> r
<__main__.Schedule object at 0x7f14419afe50>

A file declared as polymorphic using the Any type, contains an instance of any registered class, such as Job or Schedule. This ability avoids having to represent unrelated concepts - e.g. for Job and Schedule objects - within a single class. It’s also extensible in that further object types can be added without disturbing much of the existing code.

Note

The recovery of polymorphic representations also integrates seamlessly with version support. Refer to Versions, Upgrading And Migration.

How This Works

Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials:

{
    "value": [
        "__main__.Schedule",
        {
            "area": 0,
            "dow": [
                null,
                null,
                null,
                null,
                null,
                null,
                null
            ],
            "tod": []
        },
        []
    ]
}

The JSON value has changed from an object to a list that contains a string, an object and an empty list. The string appears to identify the type of the object.

That is exactly the case. The first element is the name of the declared class in a form curated by the library and the second element is exactly what normally appears as the value. This leads to the understanding of why a file created using ar.File('job', Job) cannot be recovered by a file declared using ar.File('job', ar.Any).

A polymorphic “recover” operation cannot recover anything, it must be presented with materials created by a polymorphic “store”.

Note

The third element is an internal table required to track the movement of pointer materials. With no pointers in sight, the table is empty.

Going Incognito

As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recovery operation the library decompiles that tag to the actual Python class object.

Any failure to decompile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application.

The library detects these scenarios, and during a recovery folds the materials into an Incognito object. The outcome is that recovery operations of unknown types do not fail in the normal sense. They produce an instance of a special class:

def work_on_job(j):
    if isinstance(j, Job):
        # Process the one-off job.
        return True
    elif isinstance(j, Schedule):
        # Process repeating job.
        return True
    elif isinstance(j, ar.Incognito):
        log(j.type_name)
        return False
    software_error()

This implementation of work_on_job logs the name of any unregistered class. If the job is not an instance of anything tested for by the function, it calls the software_error function. The class of the job was known within the application (i.e. it has been registered) but the function has not yet been updated to perform the related work.

Note

For those who are curious or are needing explanation for something they have observed, the Incognito object never appears in stored materials. The object is un-folded during the store process in a manner matching the prior folding. Effectively this allows representations of unknown type to pass through the application without change.