Converting MongoDB Javascript Query Language for Java

17-Jan-2022 Like this? Dislike this? Let me know

The MongoDB query language ("MQL") does not need a parser per se in the same way that is required of SQL. Instead of syntax and whitespace and commas and quotes, MQL boils downs to "maps of maps". For example, the following expression (using "relaxed" Javascript as permitted by mongosh, the MongoDB CLI):

 {$match: {$expr: {$eq:[0,{$size: "$someArray"}]} }}
is really just the following in Java. The Document class conveniently (and importantly!) implements java.util.Map:
    Document d1 = new Document("$size", "$someArray");
    List l1 = Arrays.asList(0, d1);
    Document d2 = new Document("$eq", l1);
    Document d3 = new Document("$expr", d2);
    Document d4 = new Document("$match", d3);

or merged:

    Document d4 = new Document("$match",
        new Document("$expr", 
            new Document("$eq", Arrays.asList(0, new Document("$size", "$someArray")))))
For modestly sized MQL expressions, in Java it is reasonable to piecemeal-construct the expression. This is perhaps bit more tedious than a simple SQL string but on the flip side, it makes programmatic construction including conditional add/delete of steps and variable substitution very easy and straightforward.

A challenge arises when the the expression is not modestly sized. A popular way to experiment with and construct desired complex aggregation pipelines is to use Compass or a plain text editor to build a Javascript/mongosh expression because readability and general understanding of the purpose of the pipeline is dramatically improved:

  1. Javascript/mongosh permits relaxed quoting of keys, e.g. fieldName: "value" instead of "fieldName": "value".
  2. New objects and arrays are created thanks to the language syntax itself using {} and [], respectively. In a map-of-maps model like MQL, this dramatically reduces the "language boilerplate."
Consider, however, the following pipeline which will take an array field friends and pick a random, unique sample of 6 items from that array and reset friends to that picklist:
{$addFields:
  {
    friends: {$reduce: {
	input: {$range:[0,6]},
	initialValue: {aa: "$friends", picks: []},
        in: {
	    $let: {
		vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:"$$value.aa"}]}},
		       sz: {$size:"$$value.aa"}
		      },
	        in: {
		    picks: {$concatArrays: [ "$$value.picks", {$slice:["$$value.aa","$$idx",1]} ]},
		    aa: {$cond: [{$eq:["$$idx",0]},
			 {$slice:["$$value.aa",1,{$subtract:["$$sz",1]}]},
			 {$cond: [
			     {$eq:["$$idx",{$subtract:["$$sz",1]}]},
			     {$slice:["$$value.aa",0,{$subtract:["$$sz",1]}]},
			     {$concatArrays: [
				 {$slice:["$$value.aa",0,"$$idx"]},
				 {$slice:["$$value.aa",{$add:["$$idx",1]},{$subtract:["$$sz",2]}]}
			     ]}
			 ]}
		    ]}
		}
	    }}
        }}
  }
}
Turning this into Java borders on being a non-starter. If in doubt, try pasting the above into Compass and execute the "Export as Java" feature:
a = Arrays.asList(new Document("$addFields",
  new Document("friends",
  new Document("$reduce",
  new Document("input",
  new Document("$range", Arrays.asList(0L, 6L)))
          .append("initialValue",
  new Document("aa", "$friends")
            .append("picks", Arrays.asList()))
          .append("in",
  new Document("$let",
  new Document("vars",
  new Document("idx",
  new Document("$floor",
  new Document("$multiply", Arrays.asList(new Document("$rand",
                      new Document()),
                      new Document("$size", "$$value.aa")))))
                .append("sz",
  new Document("$size", "$$value.aa")))
              .append("in",
  new Document("picks",
  new Document("$concatArrays", Arrays.asList("$$value.picks",
         new Document("$slice", Arrays.asList("$$value.aa", "$$idx", 1L)))))
         .append("aa",
  new Document("$cond", Arrays.asList(new Document("$eq", Arrays.asList("$$idx", 0L)),
            new Document("$slice", Arrays.asList("$$value.aa", 1L,
                new Document("$subtract", Arrays.asList("$$sz", 1L)))),
            new Document("$cond", Arrays.asList(new Document("$eq", Arrays.asList("$$idx",
                new Document("$subtract", Arrays.asList("$$sz", 1L)))),
                    new Document("$slice", Arrays.asList("$$value.aa", 0L,
                       new Document("$subtract", Arrays.asList("$$sz", 1L)))),
                    new Document("$concatArrays", Arrays.asList(new Document("$slice", Arrays.asList("$$valu\
e.aa", 0L, "$$idx")),
                     new Document("$slice", Arrays.asList("$$value.aa",
                        new Document("$add", Arrays.asList("$$idx", 1L)),
                                                          new Document("$subtract", Arrays.asList("$$sz", 2L\
))))))))))))))))))
After the initial paste, it is fair to say it is largely "read-only" after this. Trying to modify the pipeline except for variable substitution becomes ... tedious. And without an editor that performs brace and bracket matching, it becomes nearly impossible.

An Easier Approach

A helper class that makes it easy to concatenate a JSON string representation of the pipeline plus the Document.parse() utility combine to vastly simplify translating the Javascript to Java. This is the helper class:
    private static class StageHelper {
        private StringBuilder txt;
        public StageHelper() {
            this.txt = new StringBuilder();
        }
        public void add(String expr, Object ... subs) {
            expr.replace("'", "\""); // This is the helpful part.
	    if(subs.length > 0) {
		expr = String.format(expr, subs);  // this too
	    }
            txt.append(expr);
        }	
        public Document fetch() {
            return Document.parse(txt.toString());
        }
    }
The trick here is to avoid getting lost in a sea of escaped double quotes by using single quotes and letting StageHelper convert them into the double quotes required for the JSON parser. Furthermore, the add() method has varargs for easy substitution of material. The Javascript above is "wrapped" line by line using the add() method. This can be done with a macro in the editor. There are no restrictions on where the "newline" has to go; all material is added to a StringBuilder. In general, the idea is to use visual formatting to assist in the understanding of the pipeline:
	List pipeline = new ArrayList();

	int n_picks = 6;
	
        StageHelper s = new StageHelper();

	s.add("{$addFields: {");
	s.add(" friends: {$reduce: {");

	s.add("	input: {$range:[0,%d]},", n_picks); // varags substitution!

	s.add("	initialValue: {aa: '$friends', picks: []},");
	s.add(" in: {");
	s.add("	   $let: {");
	s.add("		vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:'$$value.aa'}]}},");
	s.add("		       sz: {$size:'$$value.aa'}");
	s.add("		      },");
	s.add("	        in: {");
	s.add("		    picks: {$concatArrays: [ '$$value.picks', {$slice:['$$value.aa','$$idx',1]} ]},");
	s.add("		    aa: {$cond: [{$eq:['$$idx',0]},");
	s.add("			 {$slice:['$$value.aa',1,{$subtract:['$$sz',1]}]},");
	s.add("			 {$cond: [");
	s.add("			     {$eq:['$$idx',{$subtract:['$$sz',1]}]},");
	s.add("			     {$slice:['$$value.aa',0,{$subtract:['$$sz',1]}]},");
	s.add("			     {$concatArrays: [");
	s.add("				 {$slice:['$$value.aa',0,'$$idx']},");
	s.add("				 {$slice:['$$value.aa',{$add:['$$idx',1]},{$subtract:['$$sz',2]}]}");
	s.add("			     ]}");
	s.add("			 ]}");
	s.add("				]}");
	s.add("		}");
	s.add("	    }}");
	s.add("        }}");
	s.add("  }");
	s.add("}");
	pipeline.add(s.fetch());

	AggregateIterable output = coll.aggregate(pipeline);
	MongoCursor iterator = output.iterator();
	while (iterator.hasNext()) {
	    Document doc = iterator.next();
	    // ...
	    }
Those developers already on Java 13 have it much easier thanks to text blocks. You simply have to paste the javascript into the block and go. You don't even have to convert double quotes to single quotes:
	List pipeline = new ArrayList();

	String agg = """
	{$addFields:
  {
    friends: {$reduce: {
	input: {$range:[0,%d]},
	initialValue: {aa: "$friends", picks: []},
        in: {
	    $let: {
		vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:"$$value.aa"}]}},
		       sz: {$size:"$$value.aa"}
		      },
	        in: {
		    picks: {$concatArrays: [ "$$value.picks", {$slice:["$$value.aa","$$idx",1]} ]},
		    aa: {$cond: [{$eq:["$$idx",0]},
			 {$slice:["$$value.aa",1,{$subtract:["$$sz",1]}]},
			 {$cond: [
			     {$eq:["$$idx",{$subtract:["$$sz",1]}]},
			     {$slice:["$$value.aa",0,{$subtract:["$$sz",1]}]},
			     {$concatArrays: [
				 {$slice:["$$value.aa",0,"$$idx"]},
				 {$slice:["$$value.aa",{$add:["$$idx",1]},{$subtract:["$$sz",2]}]}
			     ]}
			 ]}
		    ]}
		}
	    }}
        }}
  }
	}
	""";
	pipeline.add( Document.parse(String.format(agg, 6)) );

	AggregateIterable output = coll.aggregate(pipeline);
	MongoCursor iterator = output.iterator();
	while (iterator.hasNext()) {
	    Document doc = iterator.next();
	    // ...
	    }
It is possible to work up StageHelper so that it could handle a multi-stage pipeline and parse and return a List in one shot, but there is little extra value captured in such a design. The workhorse function in the whole setup (Document.parse()) will only parse one stage's worth of material (one Document), not a List, so it is best to manage the List in the query setup space. This also makes it more straightforward to comment out or otherwise deactivate stages for debugging/development purposes because it becomes obvious where a stage is added to the growing pipeline. Keeping the design at the stage level and not the whole pipeline also means you can mix and match techniques for creating the stages. For example, to changing the output projection of the stage above, we could do the following (top parts of stage not repeated here for brevity):
    ...
    	s.add("        }}");
	s.add("  }");
	s.add("}");
	pipeline.add(s.fetch());
    	
	// Direct object setup instead of StageHelper:
        pipeline.add(new Document("$addFields",
				  new Document("friends", "$friends.picks")));
	
	return pipeline;

Some Special Considerations

The strings passed to StageHelper are JSON and are parsed by the relaxed JSON parser in the Document class. This is different than constructing Javascript and Java objects and the difference is most obvious and common with datetimes. In Javascript and Java you can do this:
    Java:     new Document("d", new Document("$gt", new java.util.Date(Util.toDate("2022-01-01"))))
    Javascript:  {"d":{"$gt": new ISODate("2022-01-01")}}
but the same approach does not work for StageHelper::
    This will compile and also execute in the query engine without fail but it is
    wrong because types are mismatched.  Field d is a datetime in the database but
    here it is a string!  There is no java.util.Date or ISODate in JSON.
    
    s.add("{$match: {d: {$gt: '2022-01-17T21:08:45.820Z'}} } ");
You must therefore use the MongoDB EJSON conventions to identify the type:
    s.add("{$match: {d: {$gt: {$date: '2022-01-17T21:08:45.820Z'}} }} ");

or more likely through variable substitution:

    s.add("{$match: {d: {$gt: {$date: '%s'}} }} ", "2022-01-17T21:08:45.820Z");	    

Templates and Post-Parsing Substitution

Document.parse() returns a Document that is a map-of-maps and implements Map. It is therefore possible to use utilities such as dotpath navigators to directly substitute type-proper values in the structure without resorting to EJSON. For example,
	StageHelper s = new StageHelper();
	
	// Don't worry about -1 or X; they will be replaced:
	s.add("{$match: {'idx':-1, d:{'$gt':'X'}}}");
	Document doc = s.fetch();

	Map match_arg = (Map) DotPath.extractObject(doc, "$match");
	match_arg.put("idx",3);
	Map date_arg = (Map) DotPath.extractObject(doc, "$match.d");	
	date_arg.put("$gt",new java.util.Date(1642636800000L));

	// The doc is now:
	Document{{$match=Document{{idx=3, d=Document{{$gt=Wed Jan 19 19:00:00 EST 2022}}}}}}

	// If we were to use this approach on the random slice pipeline
	// example above to change the argument to $range to change the
	// number of picks, it would be:
	List a = (List) Dotpath.extract("$addFields.friends.$reduce.input.$range");
	a.put(1, 8); // change idx 1 (second item in $range array) to 8

Template substitution demands that the "shape" of the query and the dotpaths to the items being substituted are kept in sync of course, but it offers an interesting way to keep the query structure pristine and perform any substitutions with the proper types later on. Note that both the value and the type of the placeholders like -1 and X are arbitrary; they are there simply to satisfy the JSON parser. All placeholders for any type could be -999 for easier eyeballing, for example.

Going One Step Beyond

Expanding upon post-parsing substitution, it should be clear that it is simple to construct a routine where special strings encountered in the map-of-map construct are mapped to replacement values:
	StageHelper s = new StageHelper();
	
	s.add("{$match: {'idx':'@idx', d:{'$gt':'@targetDate'}}}");

	Document doc = s.fetch();

	// If parse OK, move on to subs:
	// Pseudojava:
	Map vars = {"idx": 3, "targetDate": new java.util.Date("2020-01-01")}
	StageHelper.substitute(doc, '@', vars);
This is not an imprecise or clumsy regex substitution. Document.parse() yields a simple (albeit very recursive) data structure -- not a string like JSON or SQL, but a tree structure -- that can be very precisely "walked" and values replaced with real data types, not toString() equivalents.

Like this? Dislike this? Let me know


Site copyright © 2013-2024 Buzz Moschetti. All rights reserved