Thursday, October 16

Retail media and simpson's paradox

I have said before that retail media are measurable. That does not mean that understanding what you measure is easy. Simpson's paradox lurks even when you have measured so that you can assign credit where it is due.

The classic example of Simpson’s paradox is medical:

Two drugs, A & B, are tested, each on 100 people. The people consist of a mixture of men and women. Drug A performs better than drug B, both for men and for women. Despite this drug B performs better for the whole sample. This sounds wrong as for everyone in each sample B is better, so how can it be that A does better over all? It sounds paradoxical.

A couple of examples of Simpson's paradox will be described in detail, after which you should be able to imagine what is going on with the drugs. When you reach the end stop and think that the sample is not just made of men and women, but blondes and dark hairs, tall and short, old and young, and a huge number of other variants that could be having the same hidden effect.

Example 1: Simpson's Paradox and Apple Trees

Two plantings of apple trees are being compared, one in fields belonging to Ben and the other in fields belonging to Jerry. Each planting is spread across ten different fields on the relevant farm.

The two plantings are of the two types of trees being compared. Rather than choosing different varieties of apple which might confuse the issue we are going to compare big trees with little trees. Big trees produce twice as many apples as little trees (being twice the size).

Trees on good ground produce lots of apples: little trees produce 10 apples, big trees produce 20. Trees on stony ground produce very few apples; little trees manage only a single apple, while big trees manage two.

Ben has a very poor set of fields. Nine out of ten of them are stony ground. Jerry has a good farm with only one stony field out of ten. Let’s see how the numbers stack up.

Nine of the big trees in our sample produce 2 apples, and one produces 20, making a total of 38 apples. Nine of the little trees produce 10 apples and one produces one apple, giving a total of 91 apples.

So the scores are 91:38 in favour of little trees being the most productive. Simpson's paradox is hiding the real causes of productivity and effectiveness. Now while Ben and Jerry have different farms it is fairly easy to spot that there might be a difference in the farm and unravel the paradox, realising the problem is not the type of tree but the type of field. If Ben and Jerry ever had a merger then it would be fairly impossible to spot that there was a difference, and so the figures would stand. Buy little trees, they produce higher yields than big trees. Simpson's paradox is harder to unravel the deeper the confounding factor (in this case the type of field) is hidden.

Example 2: Simpson's Paradox in Advertising

Now, for a more meaningful retail media example, take a web site with two banner ads. Bear in mind that this could just as easily be a set of different creatives within retail car-parks, each with a short code, or any other set of competing media.

Assume a hypothetical health and fitness product. The product is being advertised using two different banner creatives, A and B. As a test the banners have been put up through a network that allows limited contextual targeting, and the wellbeing category and the sport category have been selected. The results come back:

Banner A has a conversion rate of 0.76 and banner B has 0.72 so the case seems clear and it is time to go for a large scale push.

But wait a moment, the figures bear closer examination. The confounding factor giving birth to Simpson's Paradox is that we have wellbeing sites and we have sports sites. To see where the paradox is lurking it is necessary to see how each banner did in each category.

Banner A in sports 0.6
Banner A in wellbeing 0.78

Banner B in sports 0.7
Banner B in wellbing 0.8

Banner B is better for both types of placements, so why are we seeing it as worse over all? Well the answer is really quite simple; it is because the sports placements are not as good as the wellbeing placements and banner B had more of them (like the stony ground versus fertile fields, see table below).

Perhaps sports placements are displayed to people who watch sport but are not interested in becoming fit themselves? In any case having a larger number of the less fruitful placements means that banner B is at a disadvantage over all. The optimal thing to do, in a trade driving sense, would be to drop the sports category and use only banner B. If the brand wishes to be perceived as being related to sport, then that would argue for using banner B in the sports placements, but not as a trade driver. In either case banner A is less successful than banner B, so it should be dropped.

Simpson's paradox arises whenever there is a confounding factor that is allowed to remain hidden. It is never enough to say this campaign is more successful than another campaign without at least trying to dig deeper to discover why it is more successful.

If you do not have a level playing field then you are not evaluating the contestants, just the place they are playing. Would you expect an athlete to run faster up a steep hill or a fairly fit person to be faster on the flat? It might depend on how steep the hill is more than on the difference between the two contestants.

If you are told that two people are racing you will tend to assume that they are racing over the same distance on the same type of ground and that everything else is equal. If you are comparing two different ways to spend your advertising budget you will tend to assume the same thing. The difference is that with your budget there is no reason that the two should be in any way equivalent. It is up to you to find the hidden differences and ensure you have factored them all in otherwise you may be comparing stony ground with fertile soil.

In short, I will challenge all my readers to a race, and will stake a year’s wages on the outcome. The hidden factor is that you have to race over a course of 100 miles with your finish line at the north pole, where I have to race over a course of 100cm with my finish line in my sitting room. Naturally we must start at the same time, that is only fair; any takers?

Rufus Evison

No comments: