Statistical methodology for count trend experiments

Last updated:

|Edit this page

Trends experiments for count-based data use Bayesian statistics with a Gamma-Poisson model to evaluate the win probabilities and credible intervals for an experiment. Read the statistics primer for an overview if you haven't already.

What is a Gamma-Poisson model?

Imagine you run a pizza shop and want to know how many slices a customer typically orders. Some days customers might order 1 slice, others 3 slices, and occasionally someone might order 6 slices! This kind of count data (1, 2, 3, etc.) follows what's called a Poisson distribution.

The Poisson distribution has one key number: the average rate. In our pizza example, maybe it's 2.5 slices per customer. But here's the catch - we don't know the true rate for sure. We only have our observations to guess from.

This is where the Gamma distribution comes in. It helps us model our uncertainty about the true rate:

  • When we have very little data, the Gamma distribution is wide, saying "hey, the true rate could be anywhere in this broad range".
  • As we collect more data, the Gamma distribution gets narrower, saying "we're getting more confident about what the true rate is".

So when we say we're using a Gamma-Poisson model for experiments, we're:

  1. Using the Poisson distribution to model how count data naturally varies.
  2. Using the Gamma distribution to express our uncertainty about the true rate.
  3. Getting more confident in our estimates over time.

One more thing worth noting: Bayesian inference starts with an initial guess that then gets updated as more data comes in. Our model uses a "minimally informative prior" of ALPHA_PRIOR = 1 and BETA_PRIOR = 1, which is like starting with a blank slate instead of making an upfront assumption about the results.

Win probabilities

The win probability tells you how likely it is that a given variant has the highest rate compared to all other variants in the experiment. It helps you determine whether the experiment shows a statistically significant real effect vs. simply random chance.

Let's say you're testing a new menu design and have these results:

  • Control (old menu): 250 slices ordered by 100 customers (rate of 2.5 slices per customer)
  • Test (new menu): 300 slices ordered by 100 customers (rate of 3.0 slices per customer)

To calculate the win probabilities for the experiment, our methodology:

  1. Models each variant's rate using a Gamma distribution:

    • Control: Gamma(250 + ALPHA_PRIOR, 100 + BETA_PRIOR)
    • Test: Gamma(300 + ALPHA_PRIOR, 100 + BETA_PRIOR)
  2. Takes 10,000 random samples from each distribution.

  3. Checks which variant had the higher rate for each sample.

  4. Calculates the final win probabilities:

    • Control wins in 154 out of 10,000 samples = 1.54% probability
    • Test wins in 9,846 out of 10,000 samples = 98.46% probability

These results tell us we can be 98.46%% confident that the new menu design leads to more slice orders per customer than the old menu.

Credible intervals

A credible interval tells you the range where the true rate lies with 95% probability. This is different than a confidence interval, which describes how often such intervals would contain the true rate if you repeated the experiment many times (not a direct probability statement about where the rate lies).

For example, if you have these results:

  • Control (old menu): 250 slices ordered by 100 customers (rate of 2.5 slices per customer)
  • Test (new menu): 300 slices ordered by 100 customers (rate of 3.0 slices per customer)

To calculate the credible intervals for the experiment, our methodology will:

  1. Create a Gamma distribution for each variant:

    • Control: Gamma(250 + ALPHA_PRIOR, 100 + BETA_PRIOR)
    • Test: Gamma(300 + ALPHA_PRIOR, 100 + BETA_PRIOR)
  2. Find the 2.5th and 97.5% percentiles of each distribution:

    • Control: [2.2, 2.8] = "You can be 95% confident customers order between 2.2 and 2.8 slices on average with the old menu"
    • Test: [2.7, 3.3] = "You can be 95% confident customers order between 2.7 and 3.3 slices on average with the new menu"

Since these intervals barely overlap, you can be quite confident that the new menu design results in more slice orders per customer. The intervals will become narrower as you collect more data, reflecting your increasing certainty about the true rates.

Questions?

Was this page useful?

Next article

Statistical methodology for property value trend experiments

Trends experiments for property values use Bayesian statistics with a log-normal model and Normal-Inverse-Gamma prior to evaluate the win probabilities and credible intervals for an experiment. Read the statistics primer for an overview if you haven't already. What is a log-normal model with Normal-Inverse-Gamma prior? A log-normal model with Normal-Inverse-Gamma prior sounds like something you'd learn about in a quantum physics class, but its a lot less intimidating than it seems. The…

Read next article