Welcome!

Agile Computing Authors: Yeshim Deniz, Elizabeth White, JP Morgenthal, Mark Ross-Smith, AppNeta Blog

Blog Feed Post

Hadoop & NoSQL – Friends, not frenemies (Published in SDTimes, January 7, 2014)

The term Big Data is an all-encompassing phrase that has various subdivisions addressing different needs of the customers. The most common description of Big Data talks about the four V’s: Volume, Velocity, Variety and Veracity.Volume represents terabytes to exabytes of data, but this is data at rest. Velocity talks about streaming data requiring milliseconds to seconds of response time and is about data in motion. Variety is about data in many forms: structured, unstructured, text, spatial, and multimedia. Finally, veracity means data in doubt arising out of inconsistencies, incompleteness and ambiguities.Hadoop is the first commercial version of Internet-scale supercomputing, akin to what HPC (high-performance computing) has done for the scientific community. It performs, and is affordable, at scale. No wonder it originated with companies operating at Internet scale, such as Yahoo in the 1990s, and then at Google, Facebook and Twitter.

In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.

NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.

Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.

Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.
The Hadoop systems, at 200TB, cost about one-third of 200TB relational platforms. Mainframe costs have been reduced by more than US$500,000 per year while delivering 50x to 100x better performance on batch jobs. The volume of data on Hadoop is currently at 2PB. Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop. It claims to develop interactive reports in three days, a process that use to take six to 12 weeks.

NoSQL products such as MongoDB are getting hugely popular in the developer community. They seamlessly blend with modern programming languages like JavaScript, Ruby and Python, thus imparting high coding velocity. This simplicity has made them very popular in a short amount of time.

With RDBMS, there was impedance mismatch when an object-oriented programming model had to map to the row-column structure of the database (like translating Swahili to French). The rich data model can handle varieties of data with full indexing and ad hoc query capabilities.

The other reason is its ability to scale horizontally over commodity servers and provide massively parallel processing. This aspect is similar to Hadoop’s distributed architecture. However, NoSQL has to deal with the operational aspects of production databases running on premise or in the cloud, whereas Hadoop basically operates in offline batch mode for analysis.

NoSQL is used by large enterprises to build “systems of engagement.” Enterprise IT has spent decades building “systems of record” to run their business—essentially technology that contains a database. Now, CIOs are under pressure to build systems of engagement in which the focus is on using modern technology and the Internet to better communicate internally and externally.

One such system of engagement was recently built at MetLife, the 145-year old insurance company. The goal was to provide a 360-degree view of the customer (switching from a policy-centric view to a customer-centric view), whose information was scattered across 20 legacy systems of record. This way, any agent at MetLife can get a complete picture of a customer’s activities using a mobile device, anytime, from anywhere.

The entire system was developed and deployed in three months using the MongoDB platform. The reasons for the rapid deployment were attributed to MongoDB’s flexible data model, linear scaling via its sharding architecture, high coding velocity, and iterative development using JSON.NoSQL and Hadoop have a peaceful coexistence. MongoDB, for example, offers a Hadoop connection pipe for easy movement of data between the two stores. Similarly, Oracle offers a connection for data movement between Hadoop and the Oracle DB. Future additions to Hadoop such as YARN and Tez are aimed at extending it for real-time data loading and queries, but not to solve the needs of mission-critical production systems (the domain of NoSQL).Jnan Dash is a technology visionary and executive consultant in Silicon Valley. He spent 10 years at Oracle and was the Group Vice President of Systems Architecture and Technology. Prior to joining Oracle, he spent 16 years at IBM in various positions, including in development of the DB2 family of products and leading IBM’s database architecture and technology efforts.


Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

@ThingsExpo Stories
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great dea...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, discussed how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
910Telecom exhibited at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and exchanges.
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
The IoT industry is now at a crossroads, between the fast-paced innovation of technologies and the pending mass adoption by global enterprises. The complexity of combining rapidly evolving technologies and the need to establish practices for market acceleration pose a strong challenge to global enterprises as well as IoT vendors. In his session at @ThingsExpo, Clark Smith, senior product manager for Numerex, discussed how Numerex, as an experienced, established IoT provider, has embraced a new m...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and 21st International Cloud Expo, which will take place in November in Silicon Valley, California.
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, discussed the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, John Jelinek IV, a web developer at Linux Academy, will discuss why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers...
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
In his session at @ThingsExpo, Sudarshan Krishnamurthi, a Senior Manager, Business Strategy, at Cisco Systems, will discuss how IT and operational technology (OT) work together, as opposed to being in separate siloes as once was traditional. Attendees will learn how to fully leverage the power of IoT in their organization by bringing the two sides together and bridging the communication gap. He will also look at what good leadership must entail in order to accomplish this, and how IT managers ca...