Home  >  111 Epic  >  111.10.04 Epic - What is a cohort and why should I care

What is a cohort and why should I care?

This is an early rough draft of an article that was ultimately published on LinkedIn:

What is a cohort and why should I care?


  • Who: You, your customers
  • What: A way to look at your growth and retention in detail
  • When: Always!
  • Why: To track growth and retention separately from each other
  • Where: Reports

What is a cohort and why should I care?

A manager of a company or business line is like a restaurant chef.

The chef knows all the ingredients, steps, and techniques to cook every plate on the menu. If you’re a guest at the chef’s restaurant, it’s perfectly fine to look at a finished plate of beautiful food and think how wonderful it will taste, without knowing a thing about how it was made. But the chef must know.

Most managers look at reports and see time series of rolled-up data. E.g., a total number of “something” over time. It’s like they’re a guest of their own company who does not understand the recipe of their own results!

In this article, I’ll cover the difference between simple time series and cohorts, from simple to complex. If you already know the basics, stick around and prepare to become a chef!

!

Time series of customer data like this are “flat” and they hide important information. This time series is like the finished plate, after all the results have been cooked. It tells you nothing valuable. I call this a “vanity metric” and many managers rely on “flat” data like this to do their job.

E.g., when looking at a report like “Active Users by Month”, a time series shows you the total number of customers who transacted with you during the month. You don’t see how many of those are new users, how many are existing users. You don’t see what percentage of your users have come from different periods, user retention, or growth.

A cohort triangle, on the other hand, separates out the new customers (growth) from the existing customers (retention). I like to think of a 1-D time series getting blown up, like this:

!

How to read a cohort triangle. First, the simple stuff

A cohort is a 2-D view into a time series: ! 1. The cohort “period” is in the left column and the top row. 2. The new customers for the period match up to the appropriate coordinate. 3. The existing repeat customers for a cohort flow to the right for each subsequent period. In the “Apr 2018” cohort there were 4,260 new customers, with 2,449 of them transacting again in the “Jul 2018” cohort. 4. The cohort’s leading edge always shows the new group of customers who came in during this time period, with the previous cohorts’ repeat customers stacked above it. 5. Sum all the values in a column to get the aggregate at bottom

Boom! You just gave yourself lots of clarity on growth and retention!

Now some extra power

Still keeping it simple though…

Adding value to the cohort There are two ways to add value to a cohort triangle: 1. Add more relevant information using the existing cohort data 2. Make the cohort easier to read

I’ll build up a cohort step-by-step and show you why it’s so powerful:

Add more relevant information using the existing cohort data

! 1. Add summary rows for New and Existing users 2. Calculate period-over-period growth 3. Compare the New/Existing user ratio 4. Calculate gross churn 5. Calculate CGR(3) and CGR(6)

Make the cohort easier to read

! 1. Put the cohort size on the left with a spark chart 2. Make the growth and retention data different colors 3. Graph revenue growth by New/Existing over time - ! 4. Graph New vs. Existing customer percentage over time - !

Create cohorts with different metrics

Revenue, Total Number of Transactions, AOV, ARPU, LTV, and other relevant metrics give you new insights. All the same methods and rules apply.

!

Note: Keep the same cohort size and spark charts in the left columns. This will make it easy for you to reference across many different metrics of cohorts.

Left aligned cohorts are used to compare progress over time

To transform a regular cohort triangle into a left-aligned cohort, simply slide every row all the way to the left and change the header row to indicate generic periods instead of specific periods of time.

!

Note: A right-aligned cohort is the preferred initial view, because of the extra data which can be summed and compared across time (as demonstrated above). When you left-align a cohort, most of the summary rows from your right-aligned cohort no longer make sense, so remove them.

Once you have a left-aligned cohort it is easy to apply conditional formatting in Excel or Sheets in order to visually see trends both across rows and down columns. Are cohorts getting better or worse over time?

!

The graph of this view is called a spaghetti graph, you can see trends over time.

!

Now turn the numbers in your left-aligned cohort into percentages. The initial cohort size is the denominator and the current period is the numerator. Just like before, apply conditional formatting and create a spaghetti graph to get clarity on cohort retention over time. Is performance getting better or worse? Why?

! !

Now you should have 3 cohort triangles for each metric (Active Customers, Revenue, LTV, etc). One is right-aligned and two are left-aligned. You’ve only just begun… with any luck, by the end of this exercise there will be many more.

Don’t be “average”!

Now we have an incredible amount of data in an easy-to-understand format. But we’re not done yet, not by a long shot, because “data” is easy to get, “knowledge” is not, and “strategy” is way off in the distance.

What we have right now is the 2-D cohort view of every customer in the company rolled up into one big group, which is not helpful to a chef like you. It’s time to introduce customer segments.

Segmentation is key

The “overall” cohort triangle we built for Active Customers is interesting, and far more useful than the rolled-up time series of the same information. But you won’t use it to make decisions. We need to divide the data to separate out different groups of customers because they spend and purchase differently.

Segmentation happens on variables we have in the data, which allows us to partition different groups (called “columns” by data engineers, and “covariates” in statistics). We often segment on product, region, marketing channel, demographics, or any other grouping of customers that our data makes available and we think may be a valuable view into the customer population.

A “segment” of customers is anybody with a specific variable assigned to them. If you’re segmenting on region, group customers in each region into a cohort. If you have a long tail of regions (more than 10 makes it harder to compare groups) then roll-up into less granular categories, or put the long tail into an “other” category. - Create a cohort for each segment of customers. E.g. - ! - Graph and compare the growth and retention for each segment. Are some segments growing faster or retaining better than others? Do you know why? - If you have relevant marketing or finance metrics, it now becomes interesting to understand unit economics for each segment. Are you in the black for every segment? Are there some segments which aren’t profitable? Do you know why? - On the LTV cohort, how long does it take break even on CAC?

There are some questions about how to treat segments that are important to know in practice, but not practical to include in this article. They can be posted elsewhere: - Mutually exclusive and Collectively exhaustive vs. Duplicates? - Multi-covariate segmentation?

!

This view takes you from “data” to “information”. You probably already know this information at a gut level, but now it’s clear and quantified. E.g. one product grows faster than another, one region retains better than another. But now you have the numbers to back it up and you can make more informed decisions about where to allocate resources, where to focus marketing efforts, what needs optimized, and possibly which segments to abandon altogether so you can focus resources elsewhere.

RFM Segmentation

To take segmentation to another level, conduct an RFM segmentation and label each customer with a unique category. You will have a mutually exclusive, collectively exhaustive segmentation of your best/worst/other users, and you can use this segmentation to create cohorts.

! Image credit to Putler

Note: If you did this segmentation again next week, some customers would move to different segments and your cohorts would be different. But for today, this is a valuable look into your customer base.

The RFM segmentation will clearly show how different categories of customers develop over time. The Champions purchase early and often. The Lost customers have not purchased in a long time.

Now we go from “information” to “strategy”:

Graph each segment’s cohort performance against the Loyal Customers cohort performance. Find the inflection points where different groups begin to act differently and develop a thesis on how to treat them differently than the others. Perhaps an outreach effort in month 2 is appropriate? Maybe a coupon campaign in month 3 for the Potential Loyalists would be smart?

Plating your dish

Now you have 10’s, maybe even 100’s of cohorts and associated graphs. You started with the “overall” segment and then separated the customer base into segments. You made right-aligned and left-aligned cohorts. You visualized the data in graphs.

Now it’s time to show it to somebody.

Anybody who has seen a cohort triangle before knows exactly what to look for. If they don’t know what a cohort is, you should explain it to them the same way I explained it to you in this article.

No matter their experience level, the cohorts are only “information”. It’s up to you to translate meaning from the data. The final product is often shown as a combination of a deck and a spreadsheet. The deck includes screenshots and descriptions of the relevant cohorts and graphs. The spreadsheet includes everything, because once somebody realizes the power of this information they love to dig in.

If you need help pulling your cohorts together, or analyzing what the patterns in your cohorts mean, reach out to us.

How to get help

Cohort analysis is part of the Acquitention product at Epic Labs. Acquitention is “acquisition + retention” strategy, and includes cohort analysis.

We’ve analyzed billions of dollars of customer revenue, and helped improve companies all over the world. A simple Acquitention Heuristic can be done in days, and follow-on strategy implementation is available as well. We often increase retention by 5+% percent, improve marketing conversion by 10%, and we’re willing to structure our fee based on your success.

[Reach out to schedule a Zoom meeting or learn more]


Graph: