What’s Driving a Predictive Analytics Tool?

Over the last few posts, I’ve talked broadly about predictive analytics and higher education. While there are many great insights gained by using predictive analytics, you need to consider the data science driving the solution. Is it based on linear regression or is it machine learning? That could be the difference between achieving and missing your goals.

The distinction between linear regression and machine learning (also known as “Artificial Intelligence”) is critical for making accurate and actionable predictions for enrolling a class. To make this tangible from the perspective of an enrollment management professional, consider the following examples. Linear regression can effectively address simple 1:1 relationships – e.g. the closer a prospective student lives to your institution, the more likely they are to enroll. But enrollment management can involve variables based on ranges, or synergies between variables and machine learning is required to address these scenarios.

SAT scores are one example of a potentially highly predictive variable where linear regression doesn’t work. The many models we’ve built often indicate a range or “sweet spot” for SAT scores of prospective students: the institution’s historic data will show that students whose SAT scores dwell within that range have a higher probability of enrollment -- but students with an SAT score that is either below or above the sweet spot will have a lower likelihood of enrollment.

Another example is the synergy between being in-state/out-of-state, and the impact of scholarships. Clearly, being in-state will increase the likelihood of enrollment (regardless of scholarships) and scholarships increase the likelihood of enrollment (regardless of in-state/out-of-state). However, it is entirely plausible that offering scholarships to out-of-state students will induce a much bigger increase in enrollment likelihood than offering them to in-state students. Linear regression cannot capture this interaction between variables - but machine learning can.

What does this mean?

If the science behind your predictive analytics solution is linear regression and NOT machine learning, there are two downsides:

(1) the data you get may be less accurate, and
(2) there may be highly predictive variables in your data that your solution will fail to properly recognize and consider

If you want the most accurate predictions, then you need a solution built with machine learning, not linear regression. However, not all machine learning is created equal. More on that in my next post.