Correlation vs Causation: Understand the Difference for Your Product

Correlation and causality can seem deceptively similar. But recognizing their differences can be the make or break between wasting efforts on low-value features and creating a product that your customers can’t stop raving about.

In this piece we are going to focus on correlation and causation as it relates specifically to building digital products and understanding user behavior. Product managers, data scientists, and analysts will find this useful for leveraging the right insights for product growth, such as whether certain features impact user retention[1] or engagement[2].

After reading this article you will:

  • Know the key differences between correlation and causation
  • The key differences between correlation and causation
  • Two robust solutions your team can use to test for causation

What’s the difference between correlation and causation?

While causation and correlation can exist at the same time, correlation does not imply causation. Causation explicitly applies to cases where action A {quote:right}Causation explicitly applies to cases where action A causes outcome B.{/quote} causes outcome B. On the other hand, correlation is simply a relationship. Action A relates to Action B—but one event doesn’t necessarily cause the other event to happen.

Correlation and causation are often confused because the human mind likes to find patterns even when they do not exist. We often fabricate these patterns when two variables appear to be so closely associated that one is dependent on the other. That would imply a cause and effect relationship where the dependent event is the result of an independent event.

However, we cannot simply assume causation even if we see two events {quote:left}We cannot simply assume causation even if we see two events happening, seemingly together, before our eyes.{/quote}happening, seemingly together, before our eyes. One, our observations are purely anecdotal. Two, there are so many other possibilities for an association, including:

  • The opposite is true: B actually causes A.
  • The two are correlated, but there’s more to it: A and B are correlated, but they’re actually caused by C.
  • There’s another variable involved: A does cause B—as long as D happens.
  • There is a chain reaction: A causes E, which leads E to cause B (but you only saw that A causes B from your own eyes).

An example of correlation vs. causation in product analytics

You might expect to find causality in your product, where specific user actions or behaviors result in a particular outcome.

Picture this: you just launched a new version of your mobile app. You make the key bet that user retention[3] for your product is linked to in-app social behaviors. You ask your team to develop a new feature that allows users to join “communities.”

A month after you release and announce your new communities feature, adoption sits at about 20% of all users. Curious about whether communities impact retention, you create two equally-sized cohorts with randomly selected users. One cohort only has users who joined communities, and the other only has users who did not join communities.

Your analysis reveals a shocking finding: Users who joined at least one community are being retained at a rate far greater than the average user.

users-join-community

users-join-communityNearly 90% of those who joined communities are still around on Day 1 compared to 50% of those who didn’t. By Day 7, you see 60% retention in community-joiners and about 18% retention for those who were not. This seems like a massive coup.

correlation-vs-causation

correlation-vs-causationSource[4]But hold on. The rational you knows that you don’t have enough information to conclude whether joining communities causes better retention. All you know is that the two are correlated.

RETENTION PLAYBOOK

To grow your product, you need a strong retention strategy.

Read our playbook for expert advice on tools, strategies, and real-world examples to improve user retention.

?Download the playbook >>[5]

How to test for causation in your product

Causal relationships don’t happen by accident.

It might be tempting to associate two variables as “cause and effect.” But doing so without confirming causality in a robust analysis can lead to a {quote:right}Extensively test the relationship between a dependent and an independent variable before asserting causality.{/quote} false positive, where a causal relationship seems to exist, but actually isn’t there. This can occur if you don’t extensively test the relationship between a dependent and an independent variable.

False positives are problematic in generating product insights because they {quote:left}Without rigorous testing you run the risk of basing important product decisions on the wrong user behavior.{/quote} can mislead you to think you understand the link between important outcomes and user behaviors[6]. For example, you might think you know which specific key activation event[7] results in long-term user retention, but without rigorous testing you run the risk of basing important product decisions on the wrong user behavior.

Run robust experiments to determine causation

Once you find a correlation, you can test for causation by running experiments that “control the other variables and measure the difference[8].”

Two such experiments or analyses you can use to identify causation with your product are:

  • Hypothesis testing
  • A/B/n experiments

1. Hypothesis testing

The most basic hypothesis test will involve a H0 (null hypothesis) and H1 (your primary hypothesis). You can also have a secondary hypothesis, tertiary hypothesis, and so on.

The null hypothesis is the opposite of your primary hypothesis. Why? {quote:right}While you cannot prove your primary hypothesis with 100% certainty (the closest you can get is 99%), you can disprove your null hypothesis.{/quote} Because while you cannot prove your primary hypothesis with 100% certainty (the closest you can get is 99%), you can disprove your null hypothesis.

The primary hypothesis points to the causal relationship you’re researching and should identify an independent variable and dependent variable.

It is best to first create your H1, then identify its opposite and use that for your H0. Your H1 should identify the relationship you’re expecting {quote:left}Your H1 should identify the relationship you’re expecting between your independent and dependent variables.{/quote} between your independent and dependent variables. So, if we use the former example of the impact of in-app social features on retention, your independent variable would be joining communities and your dependent variable would be retention. So, your hypotheses might be:

H1: If a user joins a community within our product in the first month, then they will remain a customer for more than one year.

Then, negate your H1 to generate your null hypothesis:

1 2

Share