Taming Search using Rules
You cannot tame a dragon with a history lesson.
~George R.R. Martin. 🐉
The Lucene powered Search Engines like Solr and Elasticsearch are Information Retrieval Beasts, with enormous potential. At Walmart Labs, ASDA.com, there was a phase. We were migrating from the closed Blackbox Oracle Endeca Server which is a hybrid search-analytical database to the mighty Opensource Dragon, Apache Solr which is a popular, blazing-fast, open-source enterprise search platform built on Apache Lucene. In this blog will cover the learning of a tiny portion of our journey.
In this blog, will look on the solutions for Queries like :
- Boost some products of brand apple when people search for the iPhone.
- Show Halloween sweets/costumes on top when there is a Halloween period.
- Deboost plastic mugs when the user searches for beer mugs.
and many more, We face such scenarios more often in eCommerce.
Origin
A coin has two sides
A similar way an e-commerce search has two aspects one is the Customer, and the other is Merchandisers. All of us are familiar with the Customer side and the relevance tuning. Here we will discuss the other side of the wall, i.e., Merchandisers.
Business knowledge translated to the Business rules by merchandisers, which help us to improve the CTR. These are useful because merchandisers have done already extensive research on user behaviour and customer experience analysis. A rule-based system is an excellent alternative AI approach in terms of how to make the search experience better. Reason for this is
1. Easy to implement
2. Easy to understand and main
3. More accessible to non-technical users.
The Better search experience is the one where users can find what they are looking for with less effort of filtering, sorting or pagination.
As an example, if someone searcher for Apple or starting with a patter iPhone 5/6/7/X/XR, instead of showing them any product which has apple in it, we just want it to see Apple Electronic products on top. Rules came handy in such cases.
Endeca already has a rule engine called XM(Experience Manager), where merchandisers use to put rules, for the search and catalogue navigation browsing. While migrating from Endeca to Solr, one of the biggest mountains was, How to make sure that such functionality should be there in the new Search Engine.
Reaching Foothills of Misty Mountain
Then, we went through a lot of brainstorming to identify how can we take the rules out to the new system. One of the ideas was putting the rules in the key-value store like Redis and then retrieve them for the search term. But It wasn’t that easy as it looks.
This idea got rejected as there were many problems with the idea few of them were the search terms can be stemmed, and there could be a partial phrase search.
A Rule in general consist of 4 major parts:
- Search Term, for which rule is applicable.
- Set of instructions, which will get applied in the original Query as boost/burry/filter/sort/facets.
- Criteria, these are the one, which helps to deduce whether we need to apply this rule or not.
- Decorator, these are more of passed in the final response from the engine as it is.
SearchTerm for which rules have been defined can also have matching criteria, i.e. how it should match the Query, for example:
- It is precisely matching the SearchTerm, i.e. Match Exact.
- Partially matching the SearchTerm. via shingles, i.e. Match Phrase
- Any token can match from SearchTerm, i.e. Match Any.
After lots of discussions, thinking, and POC’s, we get two options for Rule Engine:
- One of them which I came up with was to keep Rules within SOLR as a separate collection.
- Sachin Lala made the discovery of the second solution, and he shares the idea of Querqy and how we can leverage it in the current system. Then I start putting my focus on it, to evaluate.
In this blog, we will focus on our encounter with Querqy.
Climbing the Forest of Rules: Querqy
Querqy is a framework for query preprocessing in Java-based search engines. It comes with a powerful, rule-based preprocessor named ‘Common Rules’ rewriter, which provides query-time synonyms, query-dependent boosting and down-ranking, and query-dependent filters.
René Kriegler, Committer/Maintainer
In this blog, we will explain about Querqy, how to install it under Solr and how to configure rules.
When we encountered Querqy, It looks for the best fit for our problem statement. Querqy internally used search based on trie, which is pretty fast. But there also we found some gaps in what Querqy provides and what we wanted.
Querqy solved our search and instructions, but still, we have multiple criteria to select the rule and some other information like facets which we need to apply. We discuss these possibilities with René Kriegler, and after discussions, and several pull requests, the solution was found, and we contributed that to the Querqy.
The Criteria includes the Filters, Limit and Sort.
- Filters: For a search term, select the rules with specific filters; for example, the rule must be active.
- Limit: In case when multiple rule matches, how many rules should get applied
- Sort: In case of multiple rules, this help to Sort those rules on a property. In our case, we use priority defined by merchandisers and there update the date in case of a tie.
A sample querqy rule looks like:
#This format using the @ character
notebook =>
SYNONYM: laptop
DOWN(100): case
@_id: "ID1"
@_log: "notebook,modified 2019-04-03"
@group: "electronics"
@enabled: true
@priority: 100
@tenant: ["t1", "t2", "t3"]
@culture: {"lang": "en", "country": ["gb", "us"]}#This format represents the properties as a JSON object
notebook =>
SYNONYM: laptop
DOWN(100): case
@{
_id: "ID1",
_log: "notebook,modified 2019-04-03",
group: "electronics",
enabled: true,
priority: 100,
tenant: ["t1", "t2", "t3"],
culture: {
"lang": "en",
"country": ["gb", "us"]
}
}@
Moving in-depth
Querqy implementation is straightforward to understand. From the rules, provided in a file in the given format It stores them in a prefix trie and suffix trie.
This implementation helps us to achieve a partial match and full match of the SearchTerm.
There are various rewriters, which help the parser/handler to rewrite the query after fetching the rule from the Querqy.
Querqy treats all the actions as Instructions which needs to get applied in the query for example Sort, Boost, Burry, Filter, Delete.
And to select the specific rule, there are various criteria which someone can provide for example Date Range, State of activeness, Priority and many more.
Nirvana: When Solr Meets Querqy
There is a path for everyone
Querqy was pretty flexible when it comes to adding it to Solr. It is pretty easy and quick. Querqy read all the rules from the file rules.txt and internally creates a Prefix Trie for the matching, and once after all the preprocessing completed, our Index is ready to serve the Rules.
When a new Query comes, we fetch the Rule from Querqy, then instructions from Rules will be applied to the Query. These instructions could be:
- Decorate
Decorate rules are not strictly Query rewriting rules, but they are quite handy to add Query-dependent information to search results. - Synonyms
Querqy gives you a mighty toolset for using synonyms at Query time - Up and Down
UP and DOWN rules add a positive or negative boost Query to the user Query, which helps to bring documents that match the boost Query further up or down in the result list. - Filters
Filter rules work similar to UP and DOWN rules, but instead of moving search results up or down the result list, they restrict search results to those that match the filter Query. - Delete
Delete rules allow you to remove keywords from a Query.
<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
<lst name="rewriteChain">
<lst name="rewriter">
<!--
Note the rewriter ID:
-->
<str name="id">common1</str>
<str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
<str name="rules">rules.txt</str>
<!-- ... -->
<!--
Define a selection strategy, named 'expr':
-->
<lst name="rules.selectionStrategy">
<lst name="strategy">
<str name="id">expr</str>
<!--
This selection strategy implementation allows us to select and order rules by properties:
-->
<str name="class">querqy.solr.ExpressionSelectionStrategyFactory</str>
</lst>
</lst>
</lst>
</lst>
Release the Dragon 🐲
After the querqy gets integrated with Solr, our Beast was ready to serve the need of merchandisers.
We created an Ingestion flow to Solr, which will discuss in follow up Posts.
But there was one problem, which might come that was Since Solr Cloud uses Zookeeper for config sync. What if rules file exceeded the 1MB file size limit of Zookeeper? This question also got resolved, with recent modifications in querqy, Querqy now supports GZ compressed file format along with that one can provide a list of rules files comma separated.
In My Opinion, Querqy seems to be a good fit for such kind of problems. It’s simpler and quite performant.
Querqy also has a pretty good UI to manage and publish the rules.
To make a note here, Querqy is available for both Solr and Elasticsearch.
Conclusion
In this blog, we discussed how we could incorporate Rules inside search. This approach can provide a massive advantage to the Merchandisers.
We discussed one of the approaches we discovered, i.e. Querqy. I will add another follow-up post where we will discuss the other strategy, i.e. Rules in a Collection.
Thanks for reading. Please keep watching the space for more updates.
References
Image Ref:
1. http://getyourimage.club/resize-15-may.html
2.https://www.reddit.com/r/skyrim/comments/1ebblg/dragons_in_a_mountain_after_the_fight_with_alduin/