Code Musing

“I am sorry I have had to write you such a long letter, but I did not have time to write you a short one”
Pascal, Blaise (1623 – 1662) – French philosopher and mathematician.

At the age of 18 he invented the first calculating machine.


So I wonder why do we make the same mistake?  Let’s review a few code examples.


Instead of following code:

private boolean isItemPutEligible(final SolrDocument doc) {
    String putEligibility = "N";
    Object obj = doc.getFieldValue(IS_PUT_ELIGIBLE);
    if (obj != null) {
        putEligibility = obj.toString();
    if ("Y".equalsIgnoreCase(putEligibility)) {
        return true;
    return false;

Could be shortened to:

private boolean isItemPutEligible(final SolrDocument doc) {
    Object obj = doc.getFieldValue(IS_PUT_ELIGIBLE);
    if (obj != null && obj.toString().toUpperCase().equals("Y")) {
        return true;
    return false;

Instead of this code:

Object obj = doc.getFieldValue(PROD_ID);
if (obj != null) {
    if (obj instanceof ArrayList<?>) {
        ArrayList<?> al = (ArrayList<?>) obj;
        if (!al.isEmpty()) {
            Object o = al.get(0);
            if (o != null) {
                result = o.toString();
    } else {
        result = obj.toString();

Could be shortened to:

Object obj = doc.getFirstValue(PROD_ID);
if (obj!=null)
    return obj.toString();
return null;

Both examples are much easier to read and understand I think.

In first instance we just avoid using unnecessary variable:


In second instance we used method that returns the very first value from List or Object found in document per field name or null:


I think just reviewing such examples would inspire you to write less code and drink more of your favorite drink!!!

What is new in SOLR 6.x

Solr 6 builds on the innovation of Solr 5 obviously.
First of all – let’s take a look at what was done in Solr 5.
There were improvements for “bin/solr” and “bin/post” – easy to startup Solr, add new documents, more APIs were introduced.
The user interface was rewritten in modern language (that is AngilarJS) to allow for more innovation and enhancements in near future.
The security was requested for a long time and so it was introduced in Solr 5. A few plugins were written for Kerberos and for basic authentication and authorization. There are plugin examples for customization.
Solr 5.4 introduced basic authentication
In Solr 5.5 the rule-based authorization expanded and became more flexible. The APIs were expanded such as ConfigSet API and Collections API expanded to manage collections flexibly (elegantly).
There is a new script in “bin/solr” for import and export of ZooKeeper configs. There are performance optimizations for faceting DocValue fields.

There are quite a few features in Solr 6 but let’s focus on few of them. The few big ones are ParallelSQL, Cross Data Center Replication, Graph Traversal, Modern APIs, new Jetty 9.3 with improved performance and support for HTTP/2.

ParallelSQL introduced to support relational algebra in a scalable manner. It seamlessly combines SQL with Solr’s full-text capabilities.

Parallel SQL has two modes: Realtime MapReduce and Facet aggregation model. MapReduce mode is for high cardinality fields and performs aggregation of distributed joins data. It uses the concept of shuffling very much like Map Reduce implementation frameworks, which partition the data for greater scalability, so a partitioning key is a very important piece of the data there. The other mode – Facet aggregation which pushes the aggregation to the nodes and only aggregated data returns back. So if you have a lot of data but low to no cardinality such option is quite performant.

Parallel SQL builds on two capabilities that are already in previous incarnations for SOLR: Export request handler and Streaming API.

Export request handler provides the capability of streaming the whole resultset. This can be used even with large resultsets to export them out of SOLR.

The search function is not the only one function available for Streaming API. There are also functions such as Stream Source and Stream Decorators. They define how data is retrieved and any aggregation performed and they designed to work with entire resultset. They can be compounded or wrapped to perform several operations at the same time.

Solr 6.x supports graph queries to find the interconnected data. This is local param type query parser that able to follow nodes to edges. Graph queries allow applying optional filters during the traversal. For example, you can find what your friends on social media likes “Honda Civic R 2017”, find what airplanes my friends used to fly by.

Solr 6.x APIs are more consistent, versioned, endpoint names are friendlier, JSON output by default but “wt” is still supported.

Lucene recently switched to new text scoring and instead of using TF*IDF it uses BF25.
Solr 6.x relies on latest Lucene trunk, so it inherited the same scoring algorithm. BF25 algorithm is a probabilistic model vs Term Frequency that used previously.

There is a new API to perform Backups and Restores.

Moving to Solr 6.x

First of all Solr 6.x expects that Java 8.x or higher installed on the host computer.
There is no more default schemaFactory but ManagedIndexSchemaFactory used instead. There will be no more schema.xml but managed-schema.
If no any SimilarityFactory defined then it defaulted to SchemaSimilarityFactory. If fieldType missed the similarity description it will default to BM25.