More About Types¶
This section looks more closely at what the library can provide when data requirements become more demanding. Perhaps a configuration file needs to contain a list of recently opened files, or a job scheduler needs to maintain a table of performance metrics.
A series of changes are made to the Job
introduced in the previous
section. These changes involve progressively more complex types. The full series
introduces enough information to cover most application data requirements.
A Refresh Cycle¶
The starting point is the original declaration:
import uuid
import ansar.encode as ar
class Job(object):
def __init__(self, unique_id=None, title='watchdog', priority=10, service='noop', body=b''):
self.unique_id = unique_id or uuid.uuid4()
self.title = title
self.priority = priority
self.service = service
self.body = body
ar.bind(Job)
Members of the class are listed below along with their respective Python types:
unique_id
- uuid.UUID
title
- str
priority
- int
service
- str
body
- bytes
The Job
class is complete in that no further type information is
required by the library. The library can store()
and recover()
far more complex application data, but to do so requires an entirely different
approach to declaration of type information.
Remembering When¶
Recording the moment a job was presented is likely to be useful. Adding a created
member looks like this:
import time
import uuid
import ansar.encode as ar
class Job(object):
def __init__(self, created=None, unique_id=None,
title='watchdog', priority=10, service='noop', body=b''):
self.created = created or time.time()
self.unique_id = unique_id or uuid.uuid4()
self.title = title
self.priority = priority
self.service = service
self.body = body
ar.bind(Job)
This new class declaration is also complete and can immediately be put to work. New application features and functionality can be based on the accurate storage and recovery of the creation time, which is great. However, a peek at the stored materials is disappointing:
{
"value": {
"body": "",
"created": 1666667754.7545347,
"priority": 10,
"service": "noop",
"title": "watchdog",
"unique_id": "bdcb0d8d-d5a2-4021-a182-5b57c53d59e8"
}
}
The created
member appears as a floating-point value. This might be functionally
valid but the stored representation is useless as a time. It cannot be checked visually
or modified with a text editor.
Passing additional type information at registration time fixes this issue:
ar.bind(Job, object_schema={'created': ar.ClockTime})
There is now an explicit description of the type of information that will be carried
around in the created
member. That description is being provided to the bind
registration function in a schema and it obviates the need to assign a default
value. In fact there are longer term considerations that mean it is best to use
None
as the default value, once a proper schema declaration is present.
Readability is restored in the file contents:
{
"value": {
"body": "",
"created": "2022-10-25T16:15:54.78812",
"priority": 10,
"service": "noop",
"title": "watchdog",
"unique_id": "2cbc9901-d167-48dd-8275-0505ef093053"
}
}
The Python float
type is used for both mathematical purposes and for the
marking of time. The library understands this overloading and provides a
mechanism for users to be specific about their intentions. Note that it’s
only the representation that changes and the application continues to
use the float
type. The library automatically performs the necessary
conversions during encoding and decoding.
Note
Most applications working with times will use datetime
runtime values rather
than float
. These are registered using the WorldTime
type rather
than ClockTime
. ClockTime
is retained for its convenience in certain
situations. More detailed information relating to time values can be found
here.
Declaring A Full Schema¶
A properly declared application type now looks like:
import ansar.encode as ar
class Job(object):
def __init__(self, created=None, unique_id=None, title=None,
priority=None, service=None, body=None):
self.created = created
self.unique_id = unique_id
self.title = title
self.priority = priority
self.service = service
self.body = body
JOB_SCHEMA = {
'created': ar.ClockTime,
'unique_id': ar.UUID,
'title': ar.Unicode,
'priority': ar.Integer4,
'service': ar.Unicode,
'body': ar.String,
}
ar.bind(Job, object_schema=JOB_SCHEMA)
Each member of the application type is now associated with a library type,
e.g. ar.ClockTime
and ar.Integer4
. An application type with a full
schema enjoys a variety of benefits:
readability of explicit vs implicit type information,
ability to disambiguate in scenarios such as
float
vs time,ability to describe more complex types,
and use of
None
as a default value.
A surprise awaits in the stored representation:
{
"value": {}
}
The expected value
object has been reduced to the empty object - where have
created
, unique_id
and their siblings gone? The empty object is the direct
consequence of an internal encoding rule - members with a value of None
are
omitted from stored representations. This is a simple optimization, reducing the
size of resulting files. More importantly this is part of informal version management.
Further information on versioning can be found here.
Names For Runtime Numbers¶
Where there are operational numbers that need readable representation an Enumeration is used to map the numbers to strings during storage and from strings to numbers during recovery. Declaration looks like this:
import ansar.encode as ar
ModeOfTransport = ar.Enumeration(CAR=1, TRUCK=2, MOTORCYCLE=3, BOAT=100, SCOOTER=10, SKATEBOARD=11)
class Job(object):
def __init__(self, created=None, unique_id=None, title=None,
priority=None, service=None, body=None,
transport=None):
self.created = created
self.unique_id = unique_id
self.title = title
self.priority = priority
self.service = service
self.body = body
self.transport = transport or ModeOfTransport.MOTORCYCLE
JOB_SCHEMA = {
'created': ar.ClockTime,
'unique_id': ar.UUID,
'title': ar.Unicode,
'priority': ar.Integer4,
'service': ar.Unicode,
'body': ar.String,
'transport': ModeOfTransport,
}
ar.bind(Job, object_schema=JOB_SCHEMA)
A simple example of usage appears below:
>>> f = ar.File('job', Job)
>>> j = Job()
>>> f.store(j)
>>>
>>> r, _ = f.recover()
>>> r.transport
3
Lastly, the stored representation looks like this:
{
"value": {
"transport": "MOTORCYCLE"
}
}
Sequences And Collections¶
A notification feature is added to the job processing machinery. It is decided that notifications will be in the form of emails sent to configured parties. Each job needs a list of zero or more email addresses:
import ansar.encode as ar
class Job(object):
def __init__(self, created=None, unique_id=None,
title=None, priority=None,
service=None, body=None, who=None):
self.created = created
self.unique_id = unique_id
self.title = title
self.priority = priority
self.service = service
self.body = body
self.who = who or ar.default_vector()
JOB_SCHEMA = {
'created': ar.ClockTime,
'unique_id': ar.UUID,
'title': ar.Unicode,
'priority': ar.Integer4,
'service': ar.Unicode,
'body': ar.String,
'who': ar.VectorOf(ar.Unicode),
}
ar.bind(Job, object_schema=JOB_SCHEMA)
Given the updated job creation:
Job(created=ar.clock_now(),
title='the quick',
who=['tom.pirate@black.ship', 'gerard.diplomat@ivory.tower', 'aswan.swami@nowhere.everywhere'])
The resulting file content looks like this:
{
"value": {
"created": "2022-10-25T16:15:55.094565",
"title": "the quick",
"who": [
"tom.pirate@black.ship",
"gerard.diplomat@ivory.tower",
"aswan.swami@nowhere.everywhere"
]
}
}
A job now includes a list of email addresses. The list can be zero or more addresses but there is
always a list present, i.e. the j.who
member is always an instance of a Python list
.
Note
A good example of an application type similar to Job
, but with all the bells and whistles, can be
found here. Use of the default_vector()
library function guarantees a valid default value for the member, but also touches on
a wider issue.
There is a small collection of functions such as clock_now()
. These are
shorthands that can make use of this library much cleaner. They can also be considered syntactic sugar
that extend the library dependency into further areas of code.
The object_schema
parameter is a dict
of names and type expressions and
the VectorOf is one example of a type expression. These are the full set of supported
sequences and collections:
ArrayOf
- a fixed-length sequence
VectorOf
- a variable-length sequence
DequeOf
- a double-ended queue
SetOf
- a unique collection
MapOf
- an associative array
The library makes all necessary conversions between portable representations and the Python
type best suited to a particular concept, i.e. ArrayOf
and VectorOf
map to list
,
DequeOf
to deque
, SetOf
to set
and MapOf
to dict
. There is no
explicit support for concepts such as a double-ended queue in JSON, or any other known encoding.
For this reason most sequences and collections appear as lists in representations.
Example usage appears below:
'recent_work': ar.DequeOf(ar.UUID),
'times_square': ar.ArrayOf(ar.ArrayOf(ar.ClockTime,2),2),
'time_periods': ar.DequeOf(ar.TimeSpan),
'priority_queue': ar.MapOf(int, ar.VectorOf(Job)),
Given the previously listed type expressions and the following values:
recent_work=ar.deque([
uuid.uuid4(),
uuid.uuid4(),
uuid.uuid4(),
uuid.uuid4(),
]),
times_square=[
[ar.clock_now(),ar.clock_now()],
[ar.clock_now(),ar.clock_now()]
],
time_periods=ar.deque([
ar.clock_span(hours=1,minutes=2,seconds=3),
ar.clock_span(hours=8),
ar.clock_span(minutes=10),
ar.clock_span(seconds=0, milliseconds=12, microseconds=500),
ar.clock_span(days=1),
]),
priority_queue={
100: [Job(title='a'), Job(title='b')],
10: [],
1: [Job(title='!')],
}
The resulting JSON file contents will look like this;
{
"value": {
"priority_queue": [
[
100,
[
{
"body": "",
"created": "2022-10-25T16:15:55.024847",
"priority": 10,
"service": "noop",
"title": "a",
"unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
},
{
"body": "",
"created": "2022-10-25T16:15:55.024848",
"priority": 10,
"service": "noop",
"title": "b",
"unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
}
]
],
[
10,
[]
],
[
1,
[
{
"body": "",
"created": "2022-10-25T16:15:55.024849",
"priority": 10,
"service": "noop",
"title": "!",
"unique_id": "266354ad-e59a-4351-a3b1-3b9acbe0702b"
}
]
]
],
"recent_work": [
"daa4978c-4b39-4cf3-a2bd-894de8e7a2c0",
"3925bab4-04ae-4712-bbbf-2fca9da8bd8d",
"605226d2-cb05-408e-a15c-926dae09db9e",
"06b1875e-81dc-4cc3-847a-888c5c492e68"
],
"time_periods": [
"1h2m3s",
"8h",
"10m",
"0.0125s",
"1d"
],
"times_square": [
[
"2022-10-25T16:15:55.024843",
"2022-10-25T16:15:55.024843"
],
[
"2022-10-25T16:15:55.024843",
"2022-10-25T16:15:55.024843"
]
]
}
}
Those Fixed-Size Arrays¶
All the containers have convenient functions to help with the proper initialization of members inside registered classes:
That is - except for arrays. The fundamental reason for that difference is that default instances of vectors, sets, maps and deques are empty. Additional information is required (i.e. the type of the elements and the size) to construct the content of an array. The fact that the elements of the array could themselves involve further nested structure, hints at the true nature of the situation.
A special function is provided that accepts any valid type expression - including expressions involving arrays - and returns a default instance of that type. The following example constructs a 4-by-2 array of integers:
>>> import ansar.encode as ar
>>> t = ar.ArrayOf(ar.ArrayOf(int,2),4)
>>> a = ar.make(t)
>>> a
[[None, None], [None, None], [None, None], [None, None]]
The make()
function returns the simplest, conforming instance of the
specified type. For the simple types such as int
that value is always None
. This behaviour is
a deliberate part of the informal version management and can help applications identify what version
of materials is being loaded.
A revised version of the Job
declaration appears below:
import ansar.encode as ar
class Job(object):
def __init__(self, created=None, unique_id=None,
title=None, priority=None,
service=None, body=None, who=None):
self.created = created or ar.make(ar.ClockTime)
self.unique_id = unique_id or ar.make(ar.UUID)
self.title = title
self.priority = priority
self.service = service
self.body = body
self.who = who or ar.make(ar.ArrayOf(ar.Unicode,2))
JOB_SCHEMA = {
'created': ar.ClockTime,
'unique_id': ar.UUID,
'title': ar.Unicode,
'priority': ar.Integer4,
'service': ar.Unicode,
'body': ar.String,
'who': ar.ArrayOf(ar.Unicode,2),
}
ar.bind(Job, object_schema=JOB_SCHEMA)
The who
member is now initialized with a fixed-size array consistent with the type
information registered with the Job
class;
{
"value": {
"who": [
null,
null
]
}
}
This demonstrates the use of the make()
function. The who
member
is now a fixed-size array of email addresses, perhaps supporting the notion of primary and
secondary points of contact. In a default instance of a Job
, both of these contacts are not
set. For demonstration purposes, the make
function was also used to initialize the created
and unique_id
members. As mentioned previously the default value for these types is None
,
resulting in their omission from the representation.
Automated Construction¶
If an object is extremely complex or there is little perceived benefit in manual, per-member
initialization, the make_self()
function exists to automate
that process. It assumes that all members have been initialized to None
and uses the
supplied schema to complete the guaranteed structure rule.
A revised version of the Job
declaration appears below:
import ansar.encode as ar
class Job(object):
def __init__(self, created=None, unique_id=None,
title=None, priority=None,
service=None, body=None, who=None):
self.created = created
self.unique_id = unique_id
self.title = title
self.priority = priority
self.service = service
self.body = body
self.who = who
ar.make_self(self, JOB_SCHEMA)
JOB_SCHEMA = {
'created': ar.ClockTime,
'unique_id': ar.UUID,
'title': ar.Unicode,
'priority': ar.Integer4,
'service': ar.Unicode,
'body': ar.String,
'who': ar.ArrayOf(ar.Unicode,2),
}
ar.bind(Job, object_schema=JOB_SCHEMA)
The output is the same as in the previous section, at the cost of additional CPU cycles during object construction;
{
"value": {
"who": [
null,
null
]
}
}
Polymorphic Persistence¶
The library supports the concept of polymorphism through the Any type. Rather than declaring that a file contains a specific type:
>>> f = ar.File('job', Job)
>>> j = Job()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f75deaf0750>
A file can be declared in this way:
>>> f = ar.File('job', ar.Any)
>>> j = Job()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f75deaf0750>
To the application the behaviour has not changed. If a second class declaration is introduced:
import ansar.encode as ar
DOW_TYPE = ar.ArrayOf(bool, 7)
class Schedule(object):
def __init__(self, area=None, tod=None, dow=None):
self.area = area
self.tod = tod or ar.default_vector()
self.dow = dow or ar.make(DOW_TYPE)
SCHEDULE_SCHEMA = {
'area': ar.Integer4,
'tod': ar.VectorOf(ar.ClockTime),
'dow': DOW_TYPE,
}
ar.bind(Schedule, object_schema=SCHEDULE_SCHEMA)
A fresh sequence of operations illustrates the new possibilities:
>>> f = ar.File('job', ar.Any)
>>> j = Job()
>>> s = Schedule()
>>> f.store(j)
>>> r, _ = f.recover()
>>> r
<__main__.Job object at 0x7f14419af650>
>>> f.store(s)
>>> r, _ = f.recover()
>>> r
<__main__.Schedule object at 0x7f14419afe50>
A file declared as polymorphic using the Any type, contains an instance of any registered
class, such as Job
or Schedule
. This ability avoids having to represent unrelated concepts -
e.g. for Job
and Schedule
objects - within a single class. It’s also extensible in that further
object types can be added without disturbing much of the existing code.
Note
The recovery of polymorphic representations also integrates seamlessly with version support. Refer to Versions, Upgrading And Migration.
How This Works¶
Polymorphism is an important capability. However, there can be some misunderstandings about the scope of what it can do. The simplest way to avoid any confusion is to show a sample of the stored materials:
{
"value": [
"__main__.Schedule",
{
"area": 0,
"dow": [
null,
null,
null,
null,
null,
null,
null
],
"tod": []
},
[]
]
}
The JSON value
has changed from an object to a list that contains a string,
an object and an empty list. The string appears to identify the type of the object.
That is exactly the case. The first element is the name of the declared class in a form
curated by the library and the second element is exactly what normally appears as the
value
. This leads to the understanding of why a file created using
ar.File('job', Job)
cannot be recovered by a file declared using
ar.File('job', ar.Any)
.
A polymorphic “recover” operation cannot recover anything, it must be presented with materials created by a polymorphic “store”.
Note
The third element is an internal table required to track the movement of pointer materials. With no pointers in sight, the table is empty.
Going Incognito¶
As a part of polymorphism, the library includes special handling of unknown types. During a store operation the library compiles the identifying string and tags the representation with that name. During a recovery operation the library decompiles that tag to the actual Python class object.
Any failure to decompile is simply a case of the application not knowing the named type - this is either a problem on the development side (i.e. a bug) or an operations problem. An example of the latter is where a file originating from a different system is presented to an unsuspecting application.
The library detects these scenarios, and during a recovery folds the materials into an
Incognito
object. The outcome is that recovery operations of unknown types do not
fail in the normal sense. They produce an instance of a special class:
def work_on_job(j):
if isinstance(j, Job):
# Process the one-off job.
return True
elif isinstance(j, Schedule):
# Process repeating job.
return True
elif isinstance(j, ar.Incognito):
log(j.type_name)
return False
software_error()
This implementation of work_on_job
logs the name of any unregistered class. If the
job is not an instance of anything tested for by the function, it calls the software_error
function. The class of the job was known within the application (i.e. it has been registered)
but the function has not yet been updated to perform the related work.
Note
For those who are curious or are needing explanation for something they have observed, the
Incognito
object never appears in stored materials. The object
is un-folded during the store process in a manner matching the prior folding. Effectively
this allows representations of unknown type to pass through the application without change.