Posts tagged ‘java’

Caching Filter Queries with Coherence

A pretty nice thing that I recently tried with a customer was storing a query result in a query cache.

For instance, consider the following method:

public Collection getByFilter1(String cacheName, Filter f) {
    NamedCache c = CacheFactory.getCache(cacheName);
    return c.entrySet(f);
}

A query is executed across all nodes containing cache data in a cluster. The filter acts on the values in the cache, not the keys. And as all values are usually not available locally on every node the query needs to execute on the separate nodes.

The query above is correct, but not very efficient. In the above case we query and retrieve the data as well from the separate nodes. A more efficient way would be to only get the keys for the matching values from the nodes and then retrieve the values from the local near cache:

public Collection getByFilter2(String cacheName, Filter f) {
    NamedCache c = CacheFactory.getCache(cacheName);
    Set keys = c.keySet(f);
    return c.getAll(keys).values();
}

However, in this case we still perform the query every time.

Now - the clever part. Not so much on my side, but the Coherence engineers have thought things through and made the Filter implementations have good hashCode and equals implementations. This together with the fact that they are serializable makes them possible to use as keys in a cache! Sweet! Without changing our method’s interface we can add a query cache so that each query only is performed once.

public Collection getByFilter3(String cacheName, Filter f) {
    NamedCache c = CacheFactory.getCache(cacheName);
 
    NamedCache queryC = CacheFactory.getCache(cacheName + ".querycache");
    Set keys = (Set)queryC.get(f);
 
    if(keys == null) {
        keys = c.keySet(f);
        queryC.put(f, keys);
    }
 
    return c.getAll(keys).values();
}

Note that we only save the keys in the query cache. This to avoid having several caches with the same data. When near caching is used, getting the data for the keys can still be a local only operation. Compared to getting the data from the separate nodes it’s several orders of magnitude faster - depending on the usage patterns.

Of course if queries are different every time, the query cache will not help much. But in most high load applications the same data tends to be needed several times.

Additionally, the properties queried on should in most cases be indexed. This is important to avoid too much overhead when searching for an entry in a cache.

One thing to think about when adding query caches is: how is data updated?

As part of updating the value caches the query caches should preferably be emptied of the outdated cached queries. This could be done programmatically “manually” when the data is updated, or by hooking in code to clean the entries from the query cache using the map listener mechanism. The relevant code could do a query using the ContainsFilter.

To summarize: a little code can go a long way to improve performance without affecting the interface used by an application. Good when query heavy applications are adapted to use a distributed cache like Coherence.

/M

Scala XML and Java DOM

How do we go from and DOM to Scala XML the most efficiently? Well… That’s what I asked myself the other day. The Scala API scala.xml.XML object has helper functions to create Scala XML structures from the usual suspects: Strings, InputStream, Reader, File, etc. But DOM Document or Element is missing.

Going via a Byte array, Char array or String is simple. For instance, the following outputs the DOM to a char array, which is then used as the source of the Scala XML creation:

Using Char array
//dom is the DOM Element
val charWriter = new CharArrayWriter()
TransformerFactory.newInstance.newTransformer.transform(new DOMSource(dom), new StreamResult(charWriter))
val xml = XML.load(new CharArrayReader(charWriter.toCharArray))

This works fine, and is reasonably fast. However, it does allocate some unnecessary memory (the char array) and performs some unnecessary parsing (of the char array) - both of which we’d really like to avoid.

How do we do this? Well, here’s one option that I came up with:

Using SAX
val saxHandler = new NoBindingFactoryAdapter()
saxHandler.scopeStack.push(TopScope)
TransformerFactory.newInstance.newTransformer.transform(new DOMSource(dom), new SAXResult(saxHandler))
saxHandler.scopeStack.pop
val xml = factoryHandler.rootElem

What’s going on here? Well, the Scala XML library uses SAX to parse XML and create the XML structure. One way of generating SAX events is to walk a DOM tree, which is handled by the javax.xml.transform.Transformer with a DOMSource as input and a SAXResult as output. The extension of the DefaultHandler needed for handling the SAX events is implemented by the scala.xml.parsing.FactoryAdapter, which is extended by the NoBindingFactoryAdapter used to construct the XML structure. Because of this, we can do violence on the API and use the NoBindingFactoryAdapter directly as a SAX DefaultHandler - nice! The scopeStack calls are done to maintain the scope information, which I stole from the loadXML method in the AdapterFactory class.

However, let’s take a moment to reflect on this. Using the Scala XML library in this way is not really good. Even if it’s possible to do it this way, I’ve not seen it described as a supported way of using it and therefore it should be done only after considering that the next release of Scala might remove this possibility.

[Update 2008-07-02: Burak Emir kindly added a comment; "Don't worry, the SAX factory adapter is not going to go away." - good to know!]

That said - let’s consider this an exploration of possibilities which could potentially lead to an update of the Scala XML API to allow a DOM to be used as a source instead…!

A quick test gives the following result.

Test data is a 6897 bytes XML file containing 118 elements with some 4 different namespaces.

I ran each test in 1000 iterations, with a full garbage collection before the first iteration. For every 100 iterations I printed the delta of the free memory and then timed the time for the complete 1000 iterations.

Char array: 100 iterations use around 28 MB, full test: 1414 ms
SAX: 100 iterations use around 18 MB, full test: 970 ms

So, in conclusion, not overwhelming difference, but around 1/3 faster and 1/3 less memory consumed. Can we do better? I’m not sure. :)

The next step is to do Scala XML to DOM… This could be more interesting. I see two options:

  1. Implement the DOM API wrapping Scala XML
  2. Generate SAX events based on the Scala XML and use that to build a DOM

Option 1 would be more efficient - but the DOM API isn’t fun implementing. Option 2 would be much simpler, but probably would be less efficient and require more allocations. Gotta think about this one…

/M