MBI/MBD: Map Based Interface / Map Based Driver to Persistence

21-Oct-2013

Background

A tad over 20 years ago, I invented the DBI/DBD concept for perl. You can google for that if you feel so inclined. It wasn't called DBI/DBD initially and heaven knows the likes of really dedicated folks like Tim Bunce and Alligator Descartes carried the torch. The motivation at the time was to standardize the way that connections and SQL commands could be passed to DB engines, and the results consumed. Lack of standardization in SQL nonwithstanding, for medium-duty tasks, the concept worked -- and continues to work -- perfectly fine. But it always bothered me that SQL on the way in and a ResultSet on the way out was somewhat restrictive in terms of the shapes of data that could be manipulated through the interface.

In my recent design & development efforts, I have been refining an ecosystem for the manipulation of Map-based data. Bespoke objects like class Trade and class RateCurve and class UserProfile are great for doing bespoke things, but often when you want to just combine data -- not behavior -- and externalize it in some way (putting it on a screen, passing it in a message, writing it to a file, etc.), the so-called "map of maps" or MOM design pattern becomes easier to work with. MOM is a way to manage rich nested structures of data that contain other Maps and Lists, and a small set of well-understood types like String, Date, BigDecimal, Integer, Double, and byte[] as the catch-all for other content. 99.9% of all interesting data structures can be expressed in this way.

For the MOM ecosystem to be generally useful, there must be a set of reasonably high performance (faster than 1 million/sec) core capabilities that allow the consumer to generically manipulate the content no matter if it is a derivatives trade or a list of router configurations:

Field Addressability. A means must exist to get and set a field at an arbitrary depth in the rich structure without manually "walking" the structure.
Sorting. A List of Maps may be sorted by 1 or more fields appearing at an arbitrary depth in the rich structure, either ascending or descending.
Filtering. A filter expression can be applied to a Map and a boolean result returned. Of course, the design of the filter expression itself is the real challenge, and the PQL specification has been created and a reference implementation produced to satisfy this capability.
Merging. Map B is merged into Map A with a predictable outcome particularly with respect to how nested structures are overwritten.
Difference. Map A is compared to Map B and a List of generic Difference data is produced.

The core capabilities above provide the foundation for 2 broad classes of utilities:

Content Formatters. Generic XML, JSON, Avro, GPB, Thrift, and other representations.
Persistence. Broadly, this means the ability to perform CRUD on a set of Maps against a persistence backend. This is where MBI/MBD comes into the picture.

What is MBI/MBD?

MBI is a interface specification to persistence and querying of MOM data. There is a single set of MBI classes that define client connections, collections/domains, cursor, data items, etc.
MBD is an implementation that binds a particular backend persistor to the MBI.

In truth, the MBI is not particularly exciting, just like O/JDBC is not particularly exciting. The power of O/JDBC lies in the SQL that is passed through it, not the spec itself. Similarly. the power of MBI lies in the PQL query language. PQL is not as feature-rich as SQL nor does it need to be; it is sufficient to perform a number of filtering operations on MOM data.

Hello world examples for non-trivial interfaces are always a little difficult to create, but the example below should illuminate the purpose of MBI:

    // For Postgres:
    MBClient client = new PGImpl(url, userID, userPassword);
OR
    // For MongoDB:
    MBClient client = new MongoDBImpl(machine, port, otherArgs);

    //
    // From here down, we are vendor neutral.
    //

    Database a = client.getDatabase("mydb");
    Domain d = a.getDomain("things");
	
    {
        //  This PQL is equivalent to:
        //  select * from things where dat1 >= TO_DATE(now - 4days) and lname = 'moschetti'
        Map query = new HashMap();
        List l2 = new ArrayList();
        {
    	Map m3 = new HashMap();
    	Map m2 = new HashMap();
    	m2.put("dat1", new java.util.Date(now - (4*DAYS)));
    	m3.put("gte", m2);
    	l2.add(m3);
        }
        {
    	Map m3 = new HashMap();
    	Map m2 = new HashMap();
    	m2.put("lname", "moschetti");
    	m3.put("eq", m2);
    	l2.add(m3);
        }
        
        query.put("and", l2);
    }
    	
    // Clearly, variants of query() exist for projections, preferences, etc.
    Cursor c = d.query(query);
    
    while((item = c.next()) != null) {
         Map m = item.getData();
         Date dt = (Date) m.get("createdOn");
    }

Traditionally, it has been "easy" to save rich data in all sorts of persistors and drag it all out into the application layer to perform filtering. Easy -- but at times horribly slow/expensive. So as part of the MBI/MBD design, a basic framework for SQL rewrite is also offered. This enables a PQL statement to be converted by the MBD implementation into some amount of SQL that can be used to filter content at the database level before doing the final filtering in the application space.

Benefits of the MOM and MBI/MBD Ecosystem

MOM and MBI/MBD provide a developer with a standard set of tools, expressions, and types across many data spaces: persistor-managed data, messages, caches, GUIs, and formatted files. The ecosystem renders the functions vendor-neutral.

As an example of this, MBD reference implementations have been created for MongoDB, Oracle, Postgres, and Cassandra. Those familiar with MongoDB will appreciate that the MBD implementation is relatively lightweight since MongoDB "speaks" rich MOM for basic i/o. Oracle and Postgres implementations use a more sophisticated arrangement of raw content plus "helper columns" in combination with SQL rewrite to achieve acceptable performance. The Cassandra implementation over CQL3 is fairly similar to that of Oracle and Postgres, but somewhat restricted due to the simpler feature set of CQL. An MBD implementation for Ehcache would be even more straightforward than MongoDB. Hybridized MBD implementations combining a cache and a to-disk persistor are also fairly straightforward. The logic for query/index optimization when using traditional RDBMS is the tough part and that has already been created.

Challenges of the MOM and MBI/MBD Ecosystem

The transactional, resilience, and performance profiles is highly variable across backend persistors. This means that although the basic i/o functions may be portable, certain macro design concepts such as performing large updates across multiple domains of data within one commit/rollback transaction may not be portable.
The subtlety of schema has not been finalized. Currently, the JSON schema work looks to be the most promising vendor/environment neutral way to specify names of complex types, the fields and their type that comprise them, and the optional validation rules that can be applied to content. Schemas are vital parts of communications contracts and are necessary to ensure consistency of data interpretation among participants for that data which must be uniformly interpreted. MBI/MBD has a "placeholder" for schema but no active functionality for it yet.

Like this? Dislike this? Let me know