The reality is that predictive analytics can be considered like a science
experiment, or more correctly, lots of science experiments. As we know with
science experiments we are testing a hypothesis, we may suspect we know the
outcome, but we can’t really be sure until we have completed the actual experiment.
In our case, we decide what we would like to predict, then we ask ourselves ‘what
data can plausibly be correlated to what we are trying to predict?’, then we
put that data into a predictive analytics tool (e.g. 11Ants Model Builder, SAS
Enterprise Miner, IBM’s SPSS Modeller, etc) and proceed to build a model, and
back test it and see how well it is able to actually predict. Sometimes we get
a satisfactory outcome, and sometimes we don’t.
This blog is about applying predictive analytics to real world problems in business, science and government.
Saturday, October 29, 2011
Microsoft Data Explorer - Predictive Analytics Next Inflection Point?
A very interesting development from Microsoft which may well
bring a major inflection point in the adoption of
predictive analytics.
O n October 13, 2011 a post appeared on Tim Mallalieu's Blog.
Tim is a Group Program Manager at Microsoft, and his blog post revealed that
for the last 14 months Microsoft have been very quietly working on developing
an ETL tool. The product has been assigned Microsoft Codename Data Explorer (and was previously referred to as Montego). ETL stands for Extract, Transform and Load. At risk of greatly
oversimplifying it, what an ETL tool enables you to do is to access data from
multiple storage silos, bring only the parts of the data that you need, and bring them into a place that you
can work on the data. So for example we can have some data in our CRM system,
and some data in our transactional database, and (say) weather data from the Weather
Channel website; with an ETL tool, we
can bring all the data into the same place, so it it is ready for us to work
with, we can constantly refresh the data so that it is always the most recent
data (but with the ETL performed on it). Not just that it is there in one place,
but that only the relevant data that we actually require is there, in one
place, ready for us to do something with. This is actually pretty neat, though
is not particularly new. For more see ETL .
What is brand new though, and what really holds the
potential to be game changing, is that now nearly anyone will be able to do this. It was previously a highly complicated
affair, laden with integration and a lot of complex work – people didn’t want
to try this at home, and if they weren’t experts they didn’t really want to try
it at work either. But it appears that
Microsoft have completely changed that (I say that without having used the product, but the vision here is very clear). This is good in its own right (ETL has
many uses), but when it comes to predictive analytics it is has some quite
serious implications (in a positive way).
At risk of stating the obvious, predictive analytics at its
heart relies on building predictive models, and predictive models rely on data –
often lots of data, and often disparate data.
Our experience when we launched our desktop modelling tools
(11Ants Model Builder, 11Ants Customer Churn Analyzer, and 11Ants CustomerResponse Analyzer) was that we overnight were able to trivialize the
technically most complex part of model building (I say overnight...the
technology took us over three years to develop, but one night it was finished!) and suddenly people that hadn’t
contemplated building predictive models found they could build them with very
little effort. Now suddenly a business analyst with a basic understanding of
Excel could build models with no requirement for understanding machine learning
algorithms, etc. Also experienced model builders could lift the quality of
their models with reduced development time (for a paper on how to beat 85% of
the submissions in an international data mining contest with less than 50
minutes work refer to 11Ants Customer Churn Analyzer outperforms 85% of Submissions in International Predictive Analytics Contest).
However as all good students of the Theory of Constraints
know, as soon as we remove one constraint, we clear the way for the next
constraint to become the rate limiter (there is a good book about this incidentally:
The Goal). Well it turns out that the
new rate limiting step after the trivializing of the algorithm selection and evaluation is the
extraction and preparation of the data.
So a big challenge to running multiple experiments involving
different and disparate data (we’ve already solved the problem of doing different experiments
on the same data by automating the algorithm selection with tools like 11Ants
Model Builder) is bringing in the data. It lives all over the place, and when
you have to herd cats to bring the data into one place to be able to begin
working on it, then you have a legitimate constraint.
Effectively Microsoft appears to have made the herding of
the cats a lot easier, for a lot more
people. When you make something accessible to a lot more people, interesting
things start happening, a lot more science projects get performed, and a lot more useful applications begin to develop.
If a relatively small company had developed this, I am not sure that I would make the claim that it was
going to herald an inflection point, but the fact that it is Microsoft means
that it there is going to be plenty of air time, plenty of credibility, plenty
of sales effort, and generally plenty of attention and I think we will find
that the combination of all the above will indeed cause an inflection point.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment