Context is an information architecture that helps manage sets of
data so they are stable (unchanging) and retrievable at any time. It is a vital
component of batch processing, both at the end of day and especially for
multiple times intraday.
A traditional batching scenario involving two systems that each supply two
stable sets of data (for a total of four) for processing typically looks like
this:
- At a point in time appropriate for System A, it "freezes" input to one
or more databases and runs programs to dump out the two sets of data, usually
in a flat file form, named with the processing date as a component of the
file name. The file
is typically moved to an "ftp hub" or someother staging/access mechanism that
is visible to all participants in the workflow.
- At a point in time appropriate for System B, it largely does the same
thing as System A.
- Either through a timer or a polling process, the consumer System C tries
to acquire all four sets of data and having done so, begins to execute.
As straightforward as this sounds, the subtle issues are:
- There is no explicit description and/or metadata of what the dumped
datasets represent. From the outside Systems A and B, the files "just appear."
Of course, implicitly System C knows what files it is consuming and
why, but without consulting the development and operations teams of Systems
A, B, and C, it is difficult to holistically map who is producing what for whom.
- The consumer needs to know a great deal about the implementation details
of delivery.
- The ability to replay/revend datasets is questionable. Some might be
able to do it, some not. There is also the issue of on-demand historic
retrieval of stable sets.
A Context architecture revolves around a stateless service that permits
participants to both create context definitions (name, metadata, and content)
and ask to be vended same. This is not a central database of everything.
It is a central locus of information that describes how a stable set of data
can be vended, and the vending almost always will come from a separate
technology. There are a few simple rules for Context:
- A context of data is immutable. Once created, a context of data,
regardless of the underlying implementation that vends it, must always yield
exactly the same set of data in exactly the same order. Note that
a practical issue is emergency repairs or contexts created in error. This
is accomodated by having a version
number associated with each context and only the most recent version is ever
vended.
- Context data lives forever. It is always possible to retrieve a
context of data. Because the details of implementation are hidden from the
consumer, the architecture can do clever things like switch to flatfiles for
older material. Note that in general, files of data (not necessarily flat)
are very attractive for this sort of need because they are fast, simple, and
tend to "age" well.
With Context in place, participants outside the development and operations
teams of Systems A, B, and C can observe create/consume dynamics among them.
The Context concept scales to hundreds of systems with thousands of datasets.
Site copyright © 2013 Buzz Moschetti. All rights reserved