Offline Synthetic Testing: A Quick and Safe Method to Improving Search Results

    Search optimization requires a lot of testing. Want to be sure the changes impact your website positively? Try offline synthetic testing.

    Tomáš Kramár
    Tomáš Kramár CTO & Co-Founder
    Offline Synthetic Testing: A Quick and Safe Method to Improving Search Results

    When tuning website search many people either fix individual queries or run A/B testing. However, if you want to be 100% sure that your search is working as it should, consider using offline synthetic testing. It rapidly iterates search algorithms and eliminates the risk of losing conversions by letting you test changes safely offline.

    Every e‑commerce business wants to have a great search function, but the road to it is tricky. It requires great observational skills, good knowledge of search analytics, and a systematic approach. Most people, however, just follow their gut feeling, which is not the ideal way to go.

    The power of offline synthetic testing

    One of the best methods, we’ve found, to improve search results is called offline synthetic testing. Offline because you don’t need live users for testing and synthetic because you use only measured data as your baseline – live users may behave a bit differently. The advantage of this method is that you risk no negative impact on your conversion rate, you get results rather quickly and you only need search logs (or more precisely: queries, results, and user interactions with those results).

    The method works like this: you scan users’ past searches and run them again with a new ranking algorithm. If you know what results in your search returned in the past and which of those results were clicked on or converted, you can compare them with the results from the new ranking algorithm.

    Analyzing results from offline synthetic testing

    Imagine that for query x, the old search returned product X at position 1. This means product X was the highest-converting item for this query. With the new algorithm, a new search for query x returned product X at position 10. Can you guess which search was better? The right answer is the original search.

    That was easy! Now let’s take a harder example: query x returns product Y as the best result. Some people click on it; therefore, it seems like a relevant match, but the real best result X is not among the results. Your new search fixes this by ranking product X first and product Y second. What may seem like regression is, in reality, not, because results for query x are better.

    As you see, in many cases analyzing what’s better isn’t that simple and you need complex models to get reliable and actionable results. There are various quantitative models designed to measure ranking quality such as Normalized Discounted Cumulative Gain (NDCG), Discounted Cumulative Gain, Mean Reciprocal Rank, or ranking correctness (Precision, Mean Average Precision, etc.).

    All these metrics are often measured only over top-n results because these have the highest chance to be seen by the users. There are even more robust and complex models based on implicit feedback that are more suited to search quality modeling. Each one of these metrics has different characteristics and models different search quality aspects, but in the end, whichever you choose, you will have a quantitative measure of your search that you can use to compare different ranking algorithms.

    Beyond measuring search quality change

    Offline synthetic testing allows you more than just measuring the search quality change — it enables you to understand the why’s of the performance differential. If done right, you can get rich reports and aggregations of individual historical query performance with the new ranking. You can then start asking questions and see the exact queries where the new ranking helped, or where it made the results worse.

    From our experience, going through a few queries where the search performance was most severely impacted can help you identify patterns where the new ranking is underperforming. You can then correct your assumptions, update the ranking algorithm and re-run the offline test.

    Test run summary output ranking improvement
    Summary output of a single test run. In this case the ranking has improved by 43%. Below the summary, you can see a performance breakdown (for buckets in 25% granularity) and the number of queries that were improved or deteriorated. Notice how even if this change resulted in an overall improvement, there are queries where ranking is now worse.
    Offline synthetic test output ranking search
    Example output from an offline synthetic test. You can see the users’ query, the results that was clicked from that particular search and how its position changed in the new ranking (position_from – position_to). You can also see the computed metric – NDCG@10 (Normalized Discounted Cumulative Gain computed over the top 10 results).

    The best way is to combine methods

    The good thing about offline synthetic testing is that it reveals important findings. If an offline synthetic test shows that your new search is much worse than your current search then it will generally be worse. If your offline synthetic test shows that your new search is much better than your current search then it will generally be better. But by how much?

    That’s something that this method cannot tell you since it doesn’t represent real-world behavior. To find out, you need to run a live A/B test. So, next time you want to fix something:

    1. Find a query that needs improvement.
    2. Update the ranking algorithm.
    3. Run offline synthetic tests until you are sure that there’s an improvement.
    4. Run a live A/B test to confirm.
    5. Rinse and repeat.

    This is how we do it at Luigi’s Box, and it’s given us the know-how you’ve grown to trust. The good news is we’ve integrated offline synthetic testing into Luigi’s Box, so if you want to try it out, you don’t have to develop it yourself.

    To find out more about this feature, feel free to contact our sales representatives.

    Read on

    How to Build a Search Engine for Your Website

    How to Build a Search Engine for Your Website

    Search is the fastest way to bring the customer the desired products. It is not only a pleasant UX addition to the e-shop, but it is also a significant sales channel. Customers who use a search usually know what they want, and they are determined to buy.

    Read more
    What Is Federated Search and Why Is It Important

    What Is Federated Search and Why Is It Important

    What is federated search, and how it's different from traditional search engines? And more importantly, how can it improve your website and, thus also, user experience?

    Read more
    Voice Search Challenges: What It Needs to Be Able to Do

    Voice Search Challenges: What It Needs to Be Able to Do

    How to optimize for voice search? We know it's becoming increasingly popular and have prepared a list of challenges for easier implementation.

    Read more