By Ben Raue
GetUp sends a lot of emails, and we are quite selective about which emails we send and who they are sent to.
We would run dozens of subject line tests, as well as tests of different email content, and sometimes comparing entirely different email campaigns to see which one performs better. It is not uncommon that we will test a fundraising campaign to a small segment of our list, and then use those results to determine whether we would raise enough money to justify a send to the full GetUp list, or even to a large segment of the list.
In order to decide which emails should be sent more broadly, it’s important to be able to project how much of a result we will receive from the email (how many actions taken, or how much money raised etc.)
GetUp’s email list is usually segmented in complicated ways – we rarely send a single email to our entire list. We also try to avoid sending more than one email to a person in one day, or more than three in a week.
While we always attempt to use representative samples of the larger list when conducting a test, other emails can interfere with our list, and change the composition (for example, by removing one part of the list which overlaps with a different send). This can result in the sample no longer being representative of the full list.
In order to correct for these issues, we are developing two analytical tools to allow us to project results:
- Data on how much of the final result of a send should be expected after a certain length of time, adjusted for time of week of the send.
- A method to calculate the ‘quality’ of a list of GetUp members based on their previous action rates, and a way to use this calculation to compare two different lists of different quality.
This document outlines our current thinking on these two concepts, and where we are going, and would be interested in feedback from other groups who have done anything similar.
Action time curves
While a large proportion of actions take place shortly after an email is sent, that email continues to result in actions for at least the next day.
There are reasonably predictable trends about what proportion of the activity generated by an email will have taken place at a particular point in time. For example, roughly 40% of actions have usually taken place about two hours after sending.
Using data for the 2015 calendar year, we have constructed a curve of what proportions take place in each of the first 24 hours after a send for each hour of the week, with Monday-Thursday combined as a single curve for each hour of those weekdays.
We are currently trialling use of our time curves, and recording results of our predictions. There is some evidence that the curves are not sufficiently accurate when projecting after a short time period (say two hours), and seem to overestimate how many actions or donations are yet to come from that send.
We’ll be conducting further analysis to identify a margin of error and a range of reasonable possibilities, to improve future projections.
GetUp email list-cuts are rarely as simple as emailing one of our campaign lists, and it’s very rare an email would be sent to our entire membership. For this reason, we’ve been seeking a measure by which we can judge the relative “quality” of different subsamples of the GetUp membership.
When we talk about quality we are referring to the likely performance of an email on different lists – some email lists contain members who are more likely to take action, and would thus be expected to produce better results.
We have developed a simple score which takes the actions taken by members on that list over the last twelve months, and divides it by the size of the list to produce a score. Our database categorises various actions into “time” (eg. calling an MP), “money” (donating) and “voice” (eg. signing a petition). The score gives one point to a “voice” action, ten points to a “time” action and one point per dollar to donations, all within the last year. The full list has an approximate score of 8.31 per member.
This measure is firstly useful in judging whether two lists are representative of each other – a representative sample should have a similar score to a larger list that it is sampled from. If the list that has already been emailed has a very similar score to the list that you are planning to email to, this suggests that you can project a final outcome without too much trouble (although there are still other factors which will affect the result).
Activity levels are not close to consistent across the GetUp membership. Those in the top percentile of activity level have an average score of 498, while only about 18% have a score of above 1. This means that removing a relatively small (but very active) part of a list can significantly damage the value of the remainder of that list.
This is not an uncommon occurrence – often a campaigner will lose a small chunk of their list and are then surprised at the poor performance of their email to the remainder of their list. Almost always, clashes between emails take place because multiple campaigners want to email the same people, and those people are usually above-average action-takers. So, unless campaigners start specifically seeking out inactive members to email, this phenomenon will always damage the remaining lists.
Our own internal analysis has shown that it is a reasonable assumption that there is a 1:1 ratio between an increased member value score for a list and an increased action rate, all else being equal. In other words, doubling the per-member score for a list should double the action rate, if you assume the content is the same and other factors don’t interfere.
We are also planning to use this “list quality” method to help us develop better benchmarks for open rates, click rates, action rates, and fundraising return per email. Our campaigners have been asking for benchmarks to know whether raising a certain amount of funds from a particular email is good or not. Rather than trying to use campaign categories (which always change), we plan to group emails into buckets based on the quality of the email list and produce benchmarks for each bucket.
- Campaigning - Digital
- Emails - Analysis
- Emails - Testing_Experiments
- SQL (Computer program language)