In part 1 of my introduction to predictive scoring models I
wrote about how some data variables have particular ‘predictive power’. This is
based on the correlation they have with the thing we want to predict, the
dependent variable.
In my example I attempted to find variables that would
predict likelihood of making a gift so I compiled a list of those data
variables that appeared to have the strongest correlation with being an
existing donor. The next stage of the process is to tie this together.
Dealing
with the output has 4 key steps:
Step 1 – Produce the output
file
You need to produce a report from your database for each
constituent in your sample with columns for each of your data variables.
Ask questions of your
data that require answers of YES or NO. Code the results as 1 for YES and 0 for
NO.
It will look something like this:
The first binary column in the table represents the
dependent variable. Has this constituent done the thing you are trying to
predict? If you have the skills and the tools to produce this yourself, great,
if not, you need to cultivate your relationship with your database team.
Step 2 – Calculate the score
The next thing to do is add up the 1s and 0s for all the
independent variables you have included in your file to produce a score for
each row/person. It doesn’t matter how
many variables you have included in your model. Don’t include the dependent variable in your score calculation.
It will look something like this:
Step 3 – Analyse the results
The thing we want to determine is if the score has a relationship with the number of donors found. Saving the output columns in numbers allows you to
multiply and group the results by score easily.
It is not easy to see from this whether or not the model has
produced anything useful. What I need to do is show what percentage of each
score is a donor as there are vastly different numbers of constituents at each
score.
I also find it best to show the effectiveness of a score by
plotting it into a graph.
This shows quite clearly that the higher the score the
higher the percentage of donors found.
The problem here is that there are very
few constituents at the higher levels. Only one constituent scored 8 points and
they happened to be a donor. Not much good for segmentation.
Step 4 – Split into
percentiles
It is important to look at results of your model based on
percentiles of your sample. What does the top 25% of constituents look like
when compared to the bottom 25%?
I broke my sample into four quartiles of roughly equal size:
So looking at the fourth
quartile, 23.3% are donors or 551 donors from a possible 2366.
What we have
done is identify a large group of people that are not donors but look like our
donors. We also have identified a large enough group to start segmenting our data for calling or mailing.
Until next time....
Paul






No comments:
Post a Comment