“Big Data” is everywhere, and attempts at defining it yield as many questions as bad answers. The good news is that it represents a legitimate progression of analytics and computing. Unfortunately, like many new concepts the term/phrase has been misused, mangled, misconstrued, misperceived, and can easily lead to bad practices and a failed project. Many organizations, especially the IT departments, are struggling with incorporating it into their infrastructure/application landscapes, justifying the high costs, and answering questions about how this affects their current Enterprise Data Warehouse.
What is “Big Data”
“Big Data” is a new industry buzz term that has taken on many connotations, but it essentially means the use of “large” data volumes and higher-powered computing to provide deeper and richer analyses. The problem (and often confusion) is that it can be used to describe everything from architecture to software to process. While I agree with the usage in each case, it doesn’t help clarify obfuscated terminology to those still trying to figure out what it is, how it works, and how to implement it at their organization. Similar to the “Cloud”, Big Data is not as hard as it seems and can provide a tremendous value to everyone (when done correctly).
There are three simple key areas involved with “Big Data”: 1) Servers/Infrastructure; 2) Applications/Software; 3) Analysis; with some vendors offering integration of multiple areas (aka appliances) providing even further synergies. The most critical part of “Big Data” is what to do with all of it, meaning how will it be used to garner better decisions faster.
I like the way that Dave Feinleib has coalesced all the different “Big Data” Vendors by their Applications, Infrastructure, and Technology in his Forbes article. He further visualizes the confusing aspects of “Big Data” since it seems to be apart of everything. I would highly recommend that you read through his entire article as it does a great job of debunking the “Big Data” marketing hype and relating it to tangible actions for both IT and Business.
What is an EDW anyway?
Be careful when you ask this question because it can be quite divisive. Respondents will invariably show their Inmon, Kimball, or a hybrid design preference in their answer, which can often be a very strong preference. The key problem is that there is not a single answer to this question. As Facebook says, it’s complicated. Organizations, departments, or functional units will define their right answer for organizing and analyzing their information based on a custom set of transactional system data scenarios and reporting requirements. This leads us to define an EDW in its simplest form, as a central collection of information and data used to create reports and visualizations for historical or predictive analysis.
Where does IT end and Business begin?
While a philosophical argument, the line between IT and Business has been blurring for many years now. Analytics is a key performance indicator for this as IT was classically the source for all Business KPI’s. Nowadays, we see more and more “Self Service” models where the IT department implements a software package that includes capabilities for the business user to extract, format, and print/export their data allowing them to answer their own questions. This blurs the lines in some regard and shifts the focus of IT in another. The IT department’s goal, in this sense, is to optimize and focus the data store(s) to allow for optimal performance and flexibility. The ability to extract information (not data) at the speed of thought/business is a critical determinant for success in a present day Enterprise Data Warehouse. However, with such large data volumes, frequent requests, and the need for “Google” like response times, the classic EDW is threatened by its inability to react and respond to these demands.
The Agility of Information
In July 1969, NASA landed a manned spacecraft on the moon using computing power similar to that in your toaster. It was a designed by MIT and built by Raytheon with a whopping 64 kilobytes of memory and 0.043 MHz processing capability. By comparison, my mobile phone has 64 gigabytes of memory and an A7 processor capable of 1.3 GHz (and I still complain that it’s slow) among all the other features it has such as a high-resolution camera.
Technology is changing very rapidly, about every 18 months according to Moore’s Law, but the reality holds true for processes as much as for hardware/software. Businesses who can make faster and more accurate predictions about where their customers are headed next will have a distinct and competitive advantage. This is what “Big Data” promises us…competitive advantage! In our case, the technology of in-memory databases or columnar stored databases or distributed computing is not new, but we have made some significant strides over the past couple years in better integration into the analytics framework. BI Tools are now embracing this shift in data volume as a necessary next steps in their functionality. These improvements mean a greater pressure on IT departments to make their analytics perform at the speed necessary to make accurate and timely decisions in near real-time (aka at the speed of business).
Knowledge is Power
I don’t truly believe that the EDW is set for extinction any time soon, however, we are seeing the cracks in its armor. The new in-memory, rapid insertion/extraction, and sub-second response technology is making the EDW more and more a legacy technology. However, there will always be a place for historical reporting, the need to keep historical transactions (aggregated or raw) for 7+ years, and departments (such as Finance) looking to maintain a set of (Financial) reports over time without changes. Currently, in-memory databases are implemented often as the accelerator or a bolt-on accent to the existing EDW and that progression will continue as the cost of the technology drops, but there will be a tipping point.
What next?
Let Axian come to the rescue and help define your BI strategy, develop a roadmap, work with your business community to identify the next project, and provide clarity and direction to a daunting task. For more details about Axian, Inc. and the Business Intelligence practice click here to view our portfolio or email us directly to setup a meeting.
Sources:
1) Forbes.com; July 2012; “Big Data Trends”; http://www.forbes.com/sites/davefeinleib/2012/07/24/big-data-trends/
2) Computer Weekly; July 2009; “Apollo 11: The computers that put man on the moon”; http://www.computerweekly.com/feature/Apollo-11-The-computers-that-put-man-on-the-moon
3) Apple iPhone Spec’s; http://www.apple.com/iphone-5s/specs/