trying to predict the future, one stat at a time
According to a number of business magazines and a small feature in Popular Science, the future will belong to analysts who can collect, organize, and comb endless reams of data, looking for hints of the future. This is by no means just speculation. There really are companies out there trying to predict the newest fad from blog posts, social networking sites, consumer and business purchasing habits, and media buzz. But as someone who would technically be qualified to take on such a job and try to read the digital tea leaves, I wonder if sifting through tweets and status updates on Facebook would actually yield anything interesting or actionable. Sure, there’s always a very high signal to noise ratio, but certainly, the signal must be there for those know where to look, right? And maybe it is, but the big question here is whether you even know what signal you’re expecting.
Now, trying to predict the future with statistical analyses isn’t quite like breaking out the Ouija board and trying to become one with the spirit of the average representative of your target demographic. You’re working with an actual data set and writing all sorts of tools to better parse it in a search for insight. But the problem is that a very large data set about consumer behavior really only tells you what consumers like at the moment and the emerging fads of the day rather than alert you to what’s going to be really popular and marketable in the next six months to year, giving you enough lead time to develop and test your product, as well as its marketing. The idea of looking for statistical patterns in complex data sets has been tried before on the stock market with very mixed results. Pretty much all systems that billed themselves as excellent predictors of where the market will move tomorrow, or that week, have failed. Even the best, most treasured, and most sought out systems you’ll have to pay thousands of dollars to order, are generally very conservative bets almost guaranteed to slowly go up over time and outperform virtually any get-rich-quick scheme which relies on predictable trader behaviors, fair and equal distribution of relevant information, and total transparency, things the market doesn’t have.
Ok, the stock market is one thing. Why couldn’t we use a stream of consumer data to make predictions? The prescient wonk approach to data analysis assumes that humans are more or less rational, and what they do and say now, can be extrapolated into the near future. However, we’re far more messy than that, and what we say in public isn’t always what we do in private. No data mining is going to explain why the very same people who post a long winded rant on their personal blogs about the demise of good literature own, and love, every single book of the Twilight series. Or why so many mediocre, widely panned creative works gain the success they do. All you’ll see are the double standards and contradictions writ large across your data set, tampering with all your significance tests. In effect, you would be trying to predict the actions of people who change their minds day in, day out, quickly embrace and abandon trends and fads based solely on how they feel over the last several months, indulge in guilty pleasures, and jump on bandwagons depending on how close they are with certain friends, whose relationships can change at any moment. Considering these challenges in trying to predict the human psyche, it may be easier to simply try and ride the current fad rather than try to catch the future by the tail, or just try and launch a trend of your own to ride the resulting s-curve of demand.