One interesting and useful class in core Java (not Javascript) is the class java.util.HashMap. It’s an implementation of the java.util.Map interface which represents a Map data structure. A Map (sometimes also called associative array) is a data structure that contains “mappings” of a “key” to a corresponding “value”. Unlike arrays, we don’t access the elements by their position (index), but rather by a previously associated object. The object can be as simple as a String, or as complex as a JavaBean. Another implementation of the Map interface is the class java.util.TreeMap, which we’ll also take a look at since it’s just as interesting and useful.
On the lookout
One way we can utilize a HashMap is by using it as a lookup table. A lookup table is useful when we need to combine data from two or more datasource based on a certain category. Let’s say we have a sales database and an inventory database. The sales database contains data about sales forecast of each product for next month. The inventory database contains data about the quantity of each product currently available. We are then asked to create a report that compares next month’s forecast against currently available quantity of each product.
On the fly
There are several ways to do this. The simplest (and probably the worst) one would be something like:
- Query the forecast database for all products, then iterate the result.
- For each iteration, query the inventory database for the corresponding product.
The algorithm above would execute n+1 queries against the database, where n is the number of products. Not a very efficient one, considering that a database query has quite an expensive latency overhead.
Once and for all
A better strategy would be something like:
- Query the inventory database for all products, keep the result for later use.
- Query the forecast database for all products, then iterate the result.
- For each iteration of the forecast data, find the quantity of the corresponding product by iterating the inventory data we get from no.1.
This algorithm will only execute 2 queries against the database, but it will execute as many as n? (termial function of n) conditional operations (for the if statement to find the corresponding product). Let’s say we have 100 products identified by a String product-code. The above algorithm will call java.lang.String’s equals() 5050 times (100? = 1+2+...+100 = 5050). Although such call will happen in local memory (as opposed to going through the network stack as in the case of database queries), it still is inefficient. It will impair scalability, and in certain cases could impose a considerable performance penalty.
Put it on the map
Another way to create such report would be using a HashMap:
- Query the inventory database for all products, and iterate the result to put each element into a HashMap, with the product’s code as the key.
- Query the forecast database for all products, then iterate the result.
- For each iteration of the forecast data, find the quantity of the corresponding product by “getting” it from the HashMap.
With this algorithm, we have 2 queries, and n HashMap's put() and get() calls (for a total of 200 calls for 100 products). According to the javadoc, HashMap’s get() and put() method has a constant-time performance. This means that given a properly configured HashMap, each get() and put() will take the same amount of (relatively short) time. Therefore, this algorithm will have a (much) better performance than the previous two, and it will scale well.
Sum and substance
Another situation where a HashMap would be useful is when we need to perform some sort of aggregation. Let’s say we have a multinational company that keeps its sales data in separate databases for each country. We are then asked to create a report that shows the worldwide sales of each product.
We can achieve that using something like this:
- Pick one (any) database, query the sales data, then iterate and put it into a HashMap with the product’s code as the key.
- For each of the other databases, query the sales data, and iterate the result.
- For each iteration of the sales data, get the (accumulated) sales value of the corresponding product from the HashMap. Add that to the current sales value, and put it back into the HashMap (overwriting previous value).
When this algorithm finish, we will have the value of worldwide sales for each product in the HashMap. We can retrieve that using the values() method.
The missing link
A HashMap makes no guarantee as to the ordering of its elements. This means that the entrySet(), keyset() and values() methods may return a collection of data (mappings / keys / values) that’s in different order from the order in which we put them in the first place. The ordering is unpredictable, it could even change between method invocations. If we need a HashMap with a predictable ordering of its elements, we could use java.util.LinkedHashMap.
We can configure a LinkedHashMap to use either the insertion-ordering or the access-ordering. With insertion-ordering, the elements are ordered according to the order they were put into the Map for the first time. With access-ordering, the elements are ordered according to the order they were last accessed. Please note that the “access” operation covers the get(), put(), and putAll() methods.
Let’s sort it out
As mentioned at the beginning of this article, there’s another interesting implementation of the Map interface, the TreeMap. A TreeMap is also predictable in the ordering of its elements, but unlike a LinkedHashMap, a TreeMap maintains the ordering according to its keys. The ordering can be based upon the natural ordering, or upon a Comparator provided at construction time.
Let’s modify the multinational company example above a little bit. This time, each country can sell different sets of products. The set of products may overlap between two or more countries. This means that we must not assume that sales data queried from any database contains a complete set of products. In other words, we might add new products as we’re processing each country. We’re asked to create the same report, but this time with the data sorted by product codes.
The algorithm looks something like this:
- Pick one (any) database, query the sales data, then iterate and put it into a TreeMap with the product’s code as the key.
- For each of the other databases, query the sales data, and iterate the result.
- For each iteration of the sales data, check whether we already have an entry in the TreeMap for the corresponding product.
- If we do, get the (accumulated) sales value, add that to the current sales value, and put it back into the TreeMap (overwriting previous value).
- If we don’t, put the current sales value into the TreeMap with the product’s code as the key.
When this algorithm finish, we will have the value of worldwide sales for each product in the TreeMap. Since we use the product code as the Map’s key, the data returned by the values() method (and also the keyset() and entrySet() method) will be in the order of the product code. This is because the TreeMap always maintains the ordering of its elements as we modify it.
No comments:
Post a Comment