|By Jnan Dash||
|January 11, 2014 12:38 PM EST||
In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.
NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.
Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.
Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.
The Hadoop systems, at 200TB, cost about one-third of 200TB relational platforms. Mainframe costs have been reduced by more than US$500,000 per year while delivering 50x to 100x better performance on batch jobs. The volume of data on Hadoop is currently at 2PB. Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop. It claims to develop interactive reports in three days, a process that use to take six to 12 weeks.
With RDBMS, there was impedance mismatch when an object-oriented programming model had to map to the row-column structure of the database (like translating Swahili to French). The rich data model can handle varieties of data with full indexing and ad hoc query capabilities.
The other reason is its ability to scale horizontally over commodity servers and provide massively parallel processing. This aspect is similar to Hadoop’s distributed architecture. However, NoSQL has to deal with the operational aspects of production databases running on premise or in the cloud, whereas Hadoop basically operates in offline batch mode for analysis.
NoSQL is used by large enterprises to build “systems of engagement.” Enterprise IT has spent decades building “systems of record” to run their business—essentially technology that contains a database. Now, CIOs are under pressure to build systems of engagement in which the focus is on using modern technology and the Internet to better communicate internally and externally.
One such system of engagement was recently built at MetLife, the 145-year old insurance company. The goal was to provide a 360-degree view of the customer (switching from a policy-centric view to a customer-centric view), whose information was scattered across 20 legacy systems of record. This way, any agent at MetLife can get a complete picture of a customer’s activities using a mobile device, anytime, from anywhere.
Jul. 5, 2015 06:30 PM EDT Reads: 431
Jul. 5, 2015 06:00 PM EDT Reads: 341
Jul. 5, 2015 06:00 PM EDT Reads: 433
Jul. 5, 2015 05:15 PM EDT Reads: 516
Jul. 5, 2015 05:00 PM EDT Reads: 620
Jul. 5, 2015 04:45 PM EDT Reads: 406
Jul. 5, 2015 04:30 PM EDT Reads: 1,369
Jul. 5, 2015 03:00 PM EDT Reads: 436
Jul. 5, 2015 03:00 PM EDT Reads: 1,132
Jul. 5, 2015 02:30 PM EDT Reads: 394
Jul. 5, 2015 02:30 PM EDT Reads: 1,490
Jul. 5, 2015 01:30 PM EDT Reads: 425
Jul. 5, 2015 01:00 PM EDT Reads: 400
Jul. 5, 2015 12:00 PM EDT Reads: 368
Jul. 5, 2015 11:45 AM EDT Reads: 1,764
Jul. 5, 2015 09:30 AM EDT Reads: 525
Jul. 5, 2015 09:00 AM EDT Reads: 1,151
Jul. 3, 2015 12:00 PM EDT Reads: 1,317
Jun. 29, 2015 12:15 PM EDT Reads: 2,245
Jun. 29, 2015 11:00 AM EDT Reads: 1,582