Offline Synthetic Testing: A Quick and Safe Method to Improving Search Results

Search optimization requires a lot of testing. Want to be sure the changes impact your website positively? Try offline synthetic testing.

Author Tomáš Kramár
Last update January 27, 2026
Read time 2 minutes

Offline Synthetic Testing: A Quick and Safe Method to Improving Search Results

When tuning website search many people either fix individual queries or run A/B testing. However, if you want to be 100% sure that your search is working as it should, consider using offline synthetic testing. It rapidly iterates search algorithms and eliminates the risk of losing conversions by letting you test changes safely offline.

Every e-commerce business wants to have a great search function, but the road to it is tricky. It requires great observational skills, good knowledge of search analytics, and a systematic approach. Most people, however, just follow their gut feeling, which is not the ideal way to go.

The power of offline synthetic testing

One of the best methods, we’ve found, to improve search results is called offline synthetic testing. Offline because you don’t need live users for testing and synthetic because you use only measured data as your baseline – live users may behave a bit differently. The advantage of this method is that you risk no negative impact on your conversion rate, you get results rather quickly and you only need search logs (or more precisely: queries, results, and user interactions with those results).

The method works like this: you scan users’ past searches and run them again with a new ranking algorithm. If you know what results in your search returned in the past and which of those results were clicked on or converted, you can compare them with the results from the new ranking algorithm.

Analyzing results from offline synthetic testing

Imagine that for query x, the old search returned product X at position 1. This means product X was the highest-converting item for this query. With the new algorithm, a new search for query x returned product X at position 10. Can you guess which search was better? The right answer is the original search.

That was easy! Now let’s take a harder example: query x returns product Y as the best result. Some people click on it; therefore, it seems like a relevant match, but the real best result X is not among the results. Your new search fixes this by ranking product X first and product Y second. What may seem like regression is, in reality, not, because results for query x are better.

As you see, in many cases analyzing what’s better isn’t that simple and you need complex models to get reliable and actionable results. There are various quantitative models designed to measure ranking quality such as Normalized Discounted Cumulative Gain (NDCG), Discounted Cumulative Gain, Mean Reciprocal Rank, or ranking correctness (Precision, Mean Average Precision, etc.).

All these metrics are often measured only over top-n results because these have the highest chance to be seen by the users. There are even more robust and complex models based on implicit feedback that are more suited to search quality modeling. Each one of these metrics has different characteristics and models different search quality aspects, but in the end, whichever you choose, you will have a quantitative measure of your search that you can use to compare different ranking algorithms.

Beyond measuring search quality change

Offline synthetic testing allows you more than just measuring the search quality change — it enables you to understand the why’s of the performance differential. If done right, you can get rich reports and aggregations of individual historical query performance with the new ranking. You can then start asking questions and see the exact queries where the new ranking helped, or where it made the results worse.

From our experience, going through a few queries where the search performance was most severely impacted can help you identify patterns where the new ranking is underperforming. You can then correct your assumptions, update the ranking algorithm and re-run the offline test.

Test run summary output ranking improvement

Summary output of a single test run. In this case the ranking has improved by 43%. Below the summary, you can see a performance breakdown (for buckets in 25% granularity) and the number of queries that were improved or deteriorated. Notice how even if this change resulted in an overall improvement, there are queries where ranking is now worse.

Offline synthetic test output ranking search

Example output from an offline synthetic test. You can see the users’ query, the results that was clicked from that particular search and how its position changed in the new ranking (position_from – position_to). You can also see the computed metric – NDCG@10 (Normalized Discounted Cumulative Gain computed over the top 10 results).

The best way is to combine methods

The good thing about offline synthetic testing is that it reveals important findings. If an offline synthetic test shows that your new search is much worse than your current search then it will generally be worse. If your offline synthetic test shows that your new search is much better than your current search then it will generally be better. But by how much?

That’s something that this method cannot tell you since it doesn’t represent real-world behavior. To find out, you need to run a live A/B test. So, next time you want to fix something:

Find a query that needs improvement.
Update the ranking algorithm.
Run offline synthetic tests until you are sure that there’s an improvement.
Run a live A/B test to confirm.
Rinse and repeat.

This is how we do it at Luigi’s Box, and it’s given us the know-how you’ve grown to trust. The good news is we’ve integrated offline synthetic testing into Luigi’s Box, so if you want to try it out, you don’t have to develop it yourself.

To find out more about this feature, feel free to contact our sales representatives.

Frequently asked questions

What is offline synthetic testing in e-commerce?

Offline synthetic testing is a method that allows e-commerce businesses to evaluate changes to their search & discovery systems using a copy of their live data, without affecting actual customers. This approach enables testing of new configurations, ranking signals, or features in a controlled environment, ensuring that any potential issues are identified and addressed before going live.

Why is offline synthetic testing important for e-commerce businesses?

Offline synthetic testing is crucial because it helps identify and resolve potential issues before they affect real customers. Simulating changes in a risk-free environment lets businesses ensure that new features or adjustments will enhance the user experience and performance without causing disruptions or errors in the live system.

Can I perform offline synthetic testing with Luigi’s Box?

Yes, Luigi’s Box uses synthetic testing to maintain a good search performance and ensure improvements.

AI-Powered Discovery Suite

Business

Roles

Features

Integrations

Learn

Connect

Case studies

Offline Synthetic Testing: A Quick and Safe Method to Improving Search Results

The power of offline synthetic testing

Analyzing results from offline synthetic testing

Beyond measuring search quality change

The best way is to combine methods

Frequently asked questions

What is offline synthetic testing in e-commerce?

Why is offline synthetic testing important for e-commerce businesses?

Can I perform offline synthetic testing with Luigi’s Box?

Related pages and articles

How Good is Your Search?

Query Understanding: An Efficient Way to Deal With Long Tail Queries

How to Make a Search Engine for Your Site: Optimize for Success

Contact sales

Contact support

This website uses cookies