When tuning website search many people either fix individual queries or run A/B testing. However, if you want to be 100% sure that your search is working as it should, consider using offline synthetic testing. It rapidly iterates search algorithms and eliminates the risk of losing conversions by letting you test changes safely offline.
Every e‑commerce business wants to have a great search function, but the road to it is tricky. It requires great observational skills, good knowledge of search analytics, and a systematic approach. Most people, however, just follow their gut feeling, which is not the ideal way to go.
The power of offline synthetic testing
One of the best methods, we’ve found, to improve search results is called offline synthetic testing. Offline because you don’t need live users for testing and synthetic because you use only measured data as your baseline – live users may behave a bit differently. The advantage of this method is that you risk no negative impact on your conversion rate, you get results rather quickly and you only need search logs (or more precisely: queries, results, and user interactions with those results).
The method works like this: you scan users’ past searches and run them again with a new ranking algorithm. If you know what results in your search returned in the past and which of those results were clicked on or converted, you can compare them with the results from the new ranking algorithm.
Analyzing results from offline synthetic testing
Imagine that for query x, the old search returned product X at position 1. This means product X was the highest-converting item for this query. With the new algorithm, a new search for query x returned product X at position 10. Can you guess which search was better? The right answer is the original search.
That was easy! Now let’s take a harder example: query x returns product Y as the best result. Some people click on it; therefore, it seems like a relevant match, but the real best result X is not among the results. Your new search fixes this by ranking product X first and product Y second. What may seem like regression is, in reality, not, because results for query x are better.
As you see, in many cases analyzing what’s better isn’t that simple and you need complex models to get reliable and actionable results. There are various quantitative models designed to measure ranking quality such as Normalized Discounted Cumulative Gain (NDCG), Discounted Cumulative Gain, Mean Reciprocal Rank, or ranking correctness (Precision, Mean Average Precision, etc.).
All these metrics are often measured only over top-n results because these have the highest chance to be seen by the users. There are even more robust and complex models based on implicit feedback that are more suited to search quality modeling. Each one of these metrics has different characteristics and models different search quality aspects, but in the end, whichever you choose, you will have a quantitative measure of your search that you can use to compare different ranking algorithms.
Beyond measuring search quality change
Offline synthetic testing allows you more than just measuring the search quality change — it enables you to understand the why’s of the performance differential. If done right, you can get rich reports and aggregations of individual historical query performance with the new ranking. You can then start asking questions and see the exact queries where the new ranking helped, or where it made the results worse.
From our experience, going through a few queries where the search performance was most severely impacted can help you identify patterns where the new ranking is underperforming. You can then correct your assumptions, update the ranking algorithm and re-run the offline test.
The best way is to combine methods
The good thing about offline synthetic testing is that it reveals important findings. If an offline synthetic test shows that your new search is much worse than your current search then it will generally be worse. If your offline synthetic test shows that your new search is much better than your current search then it will generally be better. But by how much?
That’s something that this method cannot tell you since it doesn’t represent real-world behavior. To find out, you need to run a live A/B test. So, next time you want to fix something:
- Find a query that needs improvement.
- Update the ranking algorithm.
- Run offline synthetic tests until you are sure that there’s an improvement.
- Run a live A/B test to confirm.
- Rinse and repeat.
This is how we do it at Luigi’s Box, and it’s given us the know-how you’ve grown to trust. The good news is we’ve integrated offline synthetic testing into Luigi’s Box, so if you want to try it out, you don’t have to develop it yourself.
To find out more about this feature, feel free to contact our sales representatives.
Tomáš is the CTO & Co-Founder of Luigi’s Box. He received Ph.D. in Computer Software Engineering from FIIT STU. Tomáš has been researching cutting-edge search technologies for more than ten years and continually works to revolutionize the concept of search and navigation to provide the best possible customer experience for site search users.
More blog posts from this author