Tech tutorials, howtos and walkthroughs

Apache JackRabbit Technology

Late October, I manage to improve the search experience on one of my employer main web application. I did it with the use of an Open Source tool called Jackrabbit. After viewing the analytics, I was amazed with the improvements and I ended up tweeting about it:

This article will focus on how I did it and how you may be able to utilize this Open Source tool to achieve similar results.
FYI: The search improvements were done for my employer and I didn’t have authorization to share the exact numbers. However, I have the knowledge and the know-how so I’ll be sharing that. I apologize in advance for use of mockups & hypothetical numbers below.

The Problem

Before search improvement - Complex UI

As with most web application, there is a search feature. Similar to most search, ours is basic:

  • Users enter their search term.
  • Click the search button.
  • The page reloads with a list of results.
  • Two optional filters exists – allowing users to filter results.

 

But having this feature was not enough. Our users wanted a more accurate search that could respond in the shortest possible time. Another main problem was that our users are looking for contents in PDFs, word documents and spreadsheets. These contents are not in the traditional database, which are indexed and in a format convenient for performing database queries.

User segmentation is another challenge. The interest level varies and providing a search feature which prioritizes each segment is of great importance.

 

Here Comes The Rabbit, JackRabbit.

First let me explain what is JackRabbit. In non-technical terms it is an Open Source content repository for both structure and unstructured content. I provides full-text search, versioning and transactions management via its hierarchical content management system in a scalable manner.

In technical terms, this technology uses Java Content Repository (JCR) specification to manage contents in an efficient manner. Which allows them to be searchable as if they were apart of a database system. This scalable and high performing tool is being maintained by the Apache Software Foundation is not your traditional content repository. It is different from your regular content management tool (eg. MongoDB or MySQL) in various ways and here are a few:

  • It is an in-memory database with access to the file system storage.
  • Cross platform tool.
  • Supports various data formats and highly configurable.
    • It is an embedded technology, therefore you can overlay it with your respective technology et: NodeJs, Python, Java and PHP. Allowing you to leverage this technology in your business.
    • You can also leverage its use by setting up docker containers.
more on Geezam.com:  Samsung’s The Frame 4K Exhibition of Thyssen-Bornemisza National Museum Collection

While Jackrabbit provides the repository, we need an index engine which will allows us to interpret and contextualize the data. Lucene is one of the many types of indexing engine we have at our disposal and integrates well with Jackrabbit. Lucene is the next secret sauce which allowed us to achieve our implementation discussed below.

Our Implementation.

Improved search UI - with suggestions and simpler interface.

We are at the first stage of our implementation however the results are compelling enough. We revamped the search feature to make it more robust, faster and easier to use. As you can see in the image, we’ve included a suggestion feature that showcase the results as you type.

The modification includes:

  • Real time input search feedback. As you type your search suggestions will update according to your search term.
  • Full sentence/phrase suggestions. Similar to DuckDuckGo and Google, the provide real time search suggestions for search term.
  • Predictive suggestions. Instead of matching by search term, we improve the feature to include similar type suggestions.
  • Spell checker. Incorrect search terms allow users to auto-correct their query.
  • Automatic search submission. For incorrect results, it will re-execute a search upon correction.
  • Search term highlights. In the results, we highlight the matching term found. This helps users co-ordinate their match from a user perspective.

 

My Findings

From tracking in Google Analytics, I noticed a 50% reduction in the number of time spent on the search page. This could be attributed to the suggestion feature and an improvement in the response accuracy. That was not all. There was also a reduction in the number of users going to the second page of the search results. Key strokes were also down by a similar rate despite we noticed an increase in results click. I utilize these and other data in assuming a 50% improvement in customer satisfaction with this feature. This provides clear indication that users spend less time searching for results and more time on what matters most to them.

more on Geezam.com:  Reverse Geocoding with Python and Geoapify

We’re yet to implement phase two. But that phase will involve the inclusion of full-text search of all the documents. One of our user segment will benefit from this feature. We have a treasure trove of documents and they’ll become searchable in short order. Additionally we’ll be using a search boosting feature to dynamically categories results based on user segment. All of this will be done using this Apache Jackrabbit.

I do expect to see similar rate of success after we implement these next steps.

 

Open Source Wins Again

Data is the new gold and having the tools necessary to access this gold is very important. Apache Jackrabbit is a powerful tool which provides us with the potential of unlocking the value of our data. We are achieving this without the purchase of a software license or an increase in operational cost. As our application scale, we’ll be able to scale this technology so as to maintain the current satisfaction level being experience by our users.

Irrespective of your business structure or technology stack, Apache Jackrabbit can help you access your data. This is an embedded technology, therefore you’ll be able to integrate it in your existing infrastructure weather through API, Docker container, plugins or other means. Including it as an extension to your existing search feature will help in providing more accurate and predictive search results.

About the Author

Read more on Geezam.com