Sunday, November 7, 2010

Predictive Analytics in Education - Predicting University Drop Outs

Why does the world need this blog?
 
I can’t be sure that it does. However it is fitting that in the first post I describe the encounter which led to my thinking that it may have at least something to offer.
 
Recently I was sitting in the lounge at LAX, after attending Predictive Analytics World and some meetings in the US. I struck up a conversation with a nice couple sitting across from me. They were both specialists in education. Specialists would actually be an understatement, they were both academics with several degrees each – and well respected in their fields. We got onto the topic of some research one of them was interested to do.

The research was to try to get further insight into what causes a student to abandon University study (otherwise known as drop out...). With the overall objective being to predict those who are at risk of dropping out at an early enough stage to intervene; and further to try to identify those factors most likely to  cause the abandonment. This has broad implications for both increasing the effectiveness of education delivered, and the financial results of the Universities themselves. I asked him what data he had, and he said he had several million records from a number of Universities, which included full data about students, courses studied, etc and whether they had dropped out or not. I asked him a little about the data, and then said to him that this should actually be quite easy.
 
At which point he looked at me like I had three heads!

Needless to say, I couldn’t help myself, and pulled out my laptop, and showed him how software like 11Ants Model Builder could actually make a job like this quite trivial. It was something like  watching someone have a religious experience - I could actually see a sense of excitement and enthusiasm cross his face as it dawned on him that he would actually be able to perform this data analysis himself, and without it taking months.

 (Just briefly to ensure that I deliver on the promise of the title of the blog: in simple terms, what you do is consolidate all the historical data by student. If you consider it in Excel format: all the rows would be individual students, and all the columns would be data points about the student, which could include the specific classes they have taken, previous education, etc. even the brand of cell phone they carry (if you suspected this could have causality) – we refer to these as ‘input columns’. Then the final column would be what we refer to as a ‘target column’ – that would be one of two values “Abandoned” or “Completed” – which you would tag every student with, depending upon whether they abandoned or completed. Then you would use a tool like 11Ants Model Builder to begin analyzing the data for patterns, trying to conduct a relationship between the input columns and the target columns. If the patterns were sufficiently strong, the result would be a Predictive Model which could then be applied to unseen students, and a prediction made as to whether they fall into the category of Abandon or Complete. You can actually go further and apply this to a propensity model, which would rank every student from most likely to abandon to least likely to abandon. This means that with limited resources you can just work your way down the list with intervention programs, knowing that your resources are focused on those most at risk. You can also get some concrete sense of which of the inputs are the most useful predictors to a student dropping out. This all sounds rather complex, but the reality is it is not – if you want to see how something like this works, there are some quite good short videos with other data at www.11AntsAnalytics.com  or feel free to email me.)

So returning to the airport lounge party... suddenly there was a new linkage between two (until that point disjointed) areas of specialty – namely (1) his long-time interest and understanding of educational factors and data and now (2) the area of predictive analytics.

 His new-found understanding of what was possible (and more importantly accessible to him) with predictive analytics would be an example of where 2+2 equals significantly more than four.

So rather than living with the knowledge that students were dropping out of college because I wasn’t spending enough time in airport lounges, I thought it would be good to create a forum where people from business, science and government could learn more about predictive analytics and its applications.
 
It is my intention to continue posting examples of real world applications for predictive analytics.

You will probably finding me referring to our software 11Ants Model Builder now and then, but that is only because I genuinely believe it is the easiest way on the planet for anyone to begin to understand and harness the power of predictive analytics - the software is used by complete beginners all the way through to PhDs in data mining. However, first and foremost this is a blog about educating people as to real world applications for predictive analytics - I definitely welcome any questions, suggestions or requests.