Crowdsourcing Analytics
Netflix is offering a million dollars to anyone who can improve the accuracy of their movie recommendation engine by 10% or more.
Of late, a lot of companies are trying to experiment with models which depend on work being done outside the company walls by volunteers and low-paid amateurs in their spare time, mostly in expectation of a reward and recognition (from Eli Lily funded Innocentive, a intermediary which brings together a large group of experts and companies with specific research problems, for corporate R&D to Threadless.com for t-shirt design)
However, the Netflix challenge is probably the first example of crowdsourcing a tough analytics problem, rather than relying on a consulting company or in-house statisticians to build the model. While there has been no winning solution to the Netflix challenge as yet, the question remains, whether it is a viable model to consider for tough analytical problems within the company?
Karim Lakhani (link to his blog) and his team at Harvard Business School have been studying this phenomenon in the context of scientific problem solving (working paper here) based on data from Innocentive’s winning entries (30% of all problems on Innocentive have been solved). Below are some of their findings of the study followed by my commentary on its applicability to analytics:
Those problems were more likely to be solved which had a presence of heterogeneous scientific interests amongst scientists submitting solutions.
In short, diversity of problem solvers area of expertise was key. Analytics lends well to diversity as it is a very multi-disciplinary field with potential applications from a various branches of science i.e. economics, mathematics, engineering, psychology, operations research etc.
72.5% of winning solvers stated that their submissions were partially or fully based on previously developed solutions,
There is some significant research work in analytics that goes on in the universities and even some companies, and there are a lot of models and techniques out there looking or applications.
In addition the study also analyzes the winning solvers and found that:
Probability of being a winning solver is significantly and positively correlated with both a desire to win the award money and intrinsic motivations like enjoying problem solving and cracking a tough problem
Having free time to actually participate in the problem solving effort significantly and positively correlates with being a winning solver.
Both these aspects are essential for any open-source model to work.
The thing to note in the study is that companies only post those problems to Innocentive, where the internal R&D team has not been able to come up with a solution. So these are really tough problems. And that is an interesting point, because crowdsourcing a simple problem is not efficient. There is much less control over time frames and the overhead associated with managing the process will not make it worthwhile.
In conclusion:
- An open-source strategy for solving tough analytical problems is certainly worth exploring for companies and initial research suggests that analytical problems can be good candidates for this model.
- There is an opportunity for an intermediary (like Innocentive) to develop an open-source community for analytics.
Majority of the organizations still spend a lot of their time on data integrity issues, struggling to organize information to generate insights and not yet focused on using data to recommend actions. However, that will change over time.
This will be an interesting area to watch over the next couple of years.
Also of interest: Link to the original article introducing crowdsourcing from Wired magazine
