Sunday, December 29, 2013

Big Data and the Role of Intuition-Harvard Business Review

http://blogs.hbr.org/2013/12/big-data-and-the-role-of-intuition/?utm_source=Socialflow&utm_medium=Tweet&utm_campaign=Socialflow

HBR Blog Network


20131226_2

Big Data and the Role of Intuition



Many people have asked me over the years about whether intuition has a role in the analytics and data-driven organization. I have always reassured them that there are plenty of places where intuition is still relevant. For example, a hypothesis is an intuition about what’s going on in the data you have about the world. The difference with analytics, of course, is that you don’t stop with the intuition — you test the hypothesis to learn whether your intuition is correct.
Another place where intuition is found in analytical companies is in the choice of the business area where analytical initiatives are undertaken. Few companies undertake a rigorous analytical study of what areas need analytics the most! The choice of a target domain is typically based on the gut feelings of executives. For example, at Caesars Entertainment — an early and continuing user of analytics in its business — the initial focus was on analytics for customer loyalty and service. CEO Gary Loveman noted that he knew that Caesars (then Harrah’s) had low levels of customer loyalty across its nationwide network of casinos. He had also done work while at Harvard Business School on the “service profit chain” — a theory that companies that improve customer service can improve financial results. While the theory had been applied and tested in several industries, it hadn’t been applied to gaming firms. But Loveman’s intuition about the value of loyalty and service was enough to propel years of analytics projects in those areas.
Of course, as with hypotheses, it’s important to confirm that your intuitions about where to apply analytics are actually valid. Loveman insists on an ROI from each analytics project at Caesars. Intuition plays an important role at the early stages of analytics strategy, however. In short, intuition’s role may be more limited in a highly analytical company, but it’s hardly extinct.
But how about with big data? Surely intuition isn’t particularly useful when there are massive amounts of data available for analysis. The companies in the online business that were early adopters of big data — Google, Facebook, LinkedIn, and so forth — had so much clickstream data available that no one needed hunches any more, correct?
Well, no, as it turns out. Major big data projects to create new products and services are often driven by intuition as well. Google’s self-driving car, for example, is described by its leaders as a big data project. Sebastian Thrun, a Google Fellow and Stanford professor, leads the project. He had an intuition that self-driving cars were possible well before all the necessary data, maps, and infrastructure were available. Motivated in part by the death of a friend in a traffic accident, he said in an interview that he formed a team to address the problem at Stanford without knowing what he was doing.
At LinkedIn, one of the company’s most successful data products, the People You May Know (PYMK) feature, was developed by Jonathan Goldman (now at Intuit) based on an intuition that people would be interested in what their former classmates and colleagues are up to. As he put it in an interview with me, he was “playing with ideas about how to help people build their networks.” That certainly sounds like an intuitive process.
Pete Skomoroch, who became Principal Data Scientist at LinkedIn a few years after PYMK was developed, believes that creativity and intuition are critical to the successful development of data products. He told me in an interview this week that companies with the courage to get behind the intuition of data scientists — without a lot of evidence yet that their ideas will be successful — are the ones that will develop successful data products. As with traditional analytics, Skomoroch notes that you have to eventually test your creativity with data and analysis. But he says that it may take several years before you know if an idea will really pay off.
So whether you’re talking about big data or conventional analytics, intuition has an important role to play. One might even say that developing the right mix of intuition and data-driven analysis is the ultimate key to success with this movement. Neither an all-intuition nor an all-analytics approach will get you to the promised land.

Independent Report: Pivotal Data Dispatch Reached Payback in 3 Months with 377% Annual ROI at NYSE Euronext

Independent Report: Pivotal Data Dispatch Reached Payback in 3 Months with 377% Annual ROI at NYSE Euronext

Nucleus Research This week, Nucleus Research compiled an independent report profiling how NYSE Euronext solved the challenge of big data in their organization with big returns using Pivotal technologies that we have now released to the market called Pivotal Data Dispatch.  By all accounts, NYSE Euronext was caught between the proverbial rock and a hard place with their data requirements. With their stock trades representing one third of the world’s equities volume, and federal regulations requiring them to keep 7 years history, by 2006 their data volume was staggering. The cost to maintain this data and the penalties the business was paying with the latency of data results was unacceptable.
However, instead of following the norm in the industry and continuing to invest in monolithic data stores that had fixed data ceilings, NYSE Euronext read the future and decided to embrace modern big data strategies early, resulting in a scalable and cost affordable data solution. As a result, they are now recognized as a early leader in the big data market, achieving payback for their efforts in just 3 months and enjoying a staggering 377% annual ROI.

Background

To give an idea of the scope of the problem, by 2007, when NYSE Euronext started to look at viable solutions to their growing challenge, even the lower cost alternative of Massive Parallel Processing (MPP) still had a market price of about $22,000 a terabyte. At the time, they were generating an average of ¾ of a terabyte a day. Knowing data volumes were only going to grow, and they needed to keep 7 years on hand, storage of the data alone was a crippling prospect.
Also, actually using the data was problematic. Typical queries had to be performed by experienced DBAs so as to not upset the infrastructure and average query time took 7 hours, and frequently required additional filters and queries. NYSE Euronext executives frequently waited up to 3 weeks to receive the results of their queries. With data not being ubiquitous and available in even near real-time, the NYSE knew it was missing opportunities—big opportunities as it turns out.

The Solution

Their first attempt to solve their data challenges started in 2007. NYSE Euronext set out to develop a solution built on our Greenplum technology (inherited from EMC in our spin-out) and IBM Netezza. The idea was to build a Data Lake, where active and historical data could be self-provisioned on-demand by the data analysts. Separate from real-time operations, data analysts were free to use the data at will.
Data volumes continued to grow and by 2010, NYSE Euronext was collecting 2 TB a day. While the cost of MPP processing was becoming cheaper, the growth rate was not affording any further savings. Their estimated costs were approaching $4.5 million a year to just store data. However, since the data could be federated across commodity hardware, this solution was still estimated to be 1/40th of a traditional data storage in a analytics environment, and provided broader access to analysts—something that provided enormous value to daily operations.
With that in mind, NYSE Euronext took a leap that many organizations to date have not. They went big on big data. With help from vendors like Pivotal, they built a system, now publicly available called Data Dispatch, that should be treated as a model for the enterprise.

Key Benefits

For full financial disclosure on the economics, please read the full report by Nucleus Research. However here are some notable highlights from the report itself speak volumes on this aggressive approach to big data, and why big data leaders like NYSE Euronext are showing the way to the market that investing in strategies to make it ubiquitous and real-time really pay off:
  • Power user productivity. With IT eliminated from the active process of harvesting data, business users are not only in control, they are empowered to use data daily to fully understand their markets. With the data available, they tend to use it more and improve business decisions.
  • Increased productivity. With the back and forth between IT and the business eliminated in every data request, both the business and IT can focus on their own areas of expertise. Data requests are fulfilled more quickly with the person who knows what they are actually after and what the data means in the drivers seat. IT also manages to service the business more effectively, while reducing the amount of direct help. This is a win-win for everyone.
  • Reduced IT labor costs. The Pivotal Data Dispatch tool services about 2000 data requests each month. Historically each request took a DBA about 1 hour, so this means approximately 2000 hours or over 83 man days of DBA work can be refocused into other areas of their massive data infrastructure.
  • Reduced decision latency. With the data request cycle compressing from 3 weeks to hours, the nearly 250 data analysts at NYSE Euronext are by default working on fresher data. This results in reduced decision latency, allowing them to use near real-time data to make important inferences and prove imperically what their markets need.

More on NYSE Euronext and Pivotal Data Dispatch

- See more at: http://blog.gopivotal.com/case-studies-2/independent-report-pivotal-data-dispatch-reached-payback-in-3-months-with-377-annual-roi-at-nyse-euronext#sthash.lCt6gLwH.dpuf