Welcome!

Agile Computing Authors: Elizabeth White, Liz McMillan, Carmen Gonzalez, Gerardo A Dada, Jnan Dash

Blog Feed Post

Hadoop & NoSQL – Friends, not frenemies (Published in SDTimes, January 7, 2014)

The term Big Data is an all-encompassing phrase that has various subdivisions addressing different needs of the customers. The most common description of Big Data talks about the four V’s: Volume, Velocity, Variety and Veracity.Volume represents terabytes to exabytes of data, but this is data at rest. Velocity talks about streaming data requiring milliseconds to seconds of response time and is about data in motion. Variety is about data in many forms: structured, unstructured, text, spatial, and multimedia. Finally, veracity means data in doubt arising out of inconsistencies, incompleteness and ambiguities.Hadoop is the first commercial version of Internet-scale supercomputing, akin to what HPC (high-performance computing) has done for the scientific community. It performs, and is affordable, at scale. No wonder it originated with companies operating at Internet scale, such as Yahoo in the 1990s, and then at Google, Facebook and Twitter.

In the scientific community, HPC was used for meteorology (weather simulation) and for solving engineering equations. Hadoop is used more for discovery and pattern matching. The underlying technology is similar: clustering, parallel processing and distributed file systems. Hadoop addresses the “volume” aspect of Big Data, mostly for offline analytics.

NoSQL products such as MongoDB address the “variety” aspect of Big Data: how to represent different data types efficiently with humongous read/write scalability and high availability for transactional systems operating in real time. The existing RDBMS solutions are inadequate to address this need with their schema rigidity and lack of scale-out solutions at low cost. Therefore, Hadoop and NoSQL are complementary in nature and do not compete at all.

Whether data is in NoSQL or RDBMS databases, Hadoop clusters are required for batch analytics (using its distributed file system and Map/Reduce computing algorithm). Several Hadoop solutions such as Cloudera’s Impala or Hortonworks’ Stinger, are introducing high-performance SQL interfaces for easy query processing.

Hadoop’s low cost and high efficiency has made it very popular. As an example, Sears’ process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata and SAS servers. The new process running on Hadoop can be completed weekly.
The Hadoop systems, at 200TB, cost about one-third of 200TB relational platforms. Mainframe costs have been reduced by more than US$500,000 per year while delivering 50x to 100x better performance on batch jobs. The volume of data on Hadoop is currently at 2PB. Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop. It claims to develop interactive reports in three days, a process that use to take six to 12 weeks.

NoSQL products such as MongoDB are getting hugely popular in the developer community. They seamlessly blend with modern programming languages like JavaScript, Ruby and Python, thus imparting high coding velocity. This simplicity has made them very popular in a short amount of time.

With RDBMS, there was impedance mismatch when an object-oriented programming model had to map to the row-column structure of the database (like translating Swahili to French). The rich data model can handle varieties of data with full indexing and ad hoc query capabilities.

The other reason is its ability to scale horizontally over commodity servers and provide massively parallel processing. This aspect is similar to Hadoop’s distributed architecture. However, NoSQL has to deal with the operational aspects of production databases running on premise or in the cloud, whereas Hadoop basically operates in offline batch mode for analysis.

NoSQL is used by large enterprises to build “systems of engagement.” Enterprise IT has spent decades building “systems of record” to run their business—essentially technology that contains a database. Now, CIOs are under pressure to build systems of engagement in which the focus is on using modern technology and the Internet to better communicate internally and externally.

One such system of engagement was recently built at MetLife, the 145-year old insurance company. The goal was to provide a 360-degree view of the customer (switching from a policy-centric view to a customer-centric view), whose information was scattered across 20 legacy systems of record. This way, any agent at MetLife can get a complete picture of a customer’s activities using a mobile device, anytime, from anywhere.

The entire system was developed and deployed in three months using the MongoDB platform. The reasons for the rapid deployment were attributed to MongoDB’s flexible data model, linear scaling via its sharding architecture, high coding velocity, and iterative development using JSON.NoSQL and Hadoop have a peaceful coexistence. MongoDB, for example, offers a Hadoop connection pipe for easy movement of data between the two stores. Similarly, Oracle offers a connection for data movement between Hadoop and the Oracle DB. Future additions to Hadoop such as YARN and Tez are aimed at extending it for real-time data loading and queries, but not to solve the needs of mission-critical production systems (the domain of NoSQL).Jnan Dash is a technology visionary and executive consultant in Silicon Valley. He spent 10 years at Oracle and was the Group Vice President of Systems Architecture and Technology. Prior to joining Oracle, he spent 16 years at IBM in various positions, including in development of the DB2 family of products and leading IBM’s database architecture and technology efforts.


Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

@ThingsExpo Stories
"Once customers get a year into their IoT deployments, they start to realize that they may have been shortsighted in the ways they built out their deployment and the key thing I see a lot of people looking at is - how can I take equipment data, pull it back in an IoT solution and show it in a dashboard," stated Dave McCarthy, Director of Products at Bsquare Corporation, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Everyone knows that truly innovative companies learn as they go along, pushing boundaries in response to market changes and demands. What's more of a mystery is how to balance innovation on a fresh platform built from scratch with the legacy tech stack, product suite and customers that continue to serve as the business' foundation. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, discussed why and how ReadyTalk diverted from healthy revenue and mor...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2017 New York. The 20th Cloud Expo and 7th @ThingsExpo will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Internet to enable us all to im...
The Internet of Things (IoT) promises to simplify and streamline our lives by automating routine tasks that distract us from our goals. This promise is based on the ubiquitous deployment of smart, connected devices that link everything from industrial control systems to automobiles to refrigerators. Unfortunately, comparatively few of the devices currently deployed have been developed with an eye toward security, and as the DDoS attacks of late October 2016 have demonstrated, this oversight can ...
You have great SaaS business app ideas. You want to turn your idea quickly into a functional and engaging proof of concept. You need to be able to modify it to meet customers' needs, and you need to deliver a complete and secure SaaS application. How could you achieve all the above and yet avoid unforeseen IT requirements that add unnecessary cost and complexity? You also want your app to be responsive in any device at any time. In his session at 19th Cloud Expo, Mark Allen, General Manager of...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
"ReadyTalk is an audio and web video conferencing provider. We've really come to embrace WebRTC as the platform for our future of technology," explained Dan Cunningham, CTO of ReadyTalk, in this SYS-CON.tv interview at WebRTC Summit at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
As data explodes in quantity, importance and from new sources, the need for managing and protecting data residing across physical, virtual, and cloud environments grow with it. Managing data includes protecting it, indexing and classifying it for true, long-term management, compliance and E-Discovery. Commvault can ensure this with a single pane of glass solution – whether in a private cloud, a Service Provider delivered public cloud or a hybrid cloud environment – across the heterogeneous enter...
"At ROHA we develop an app called Catcha. It was developed after we spent a year meeting with, talking to, interacting with senior citizens watching them use their smartphones and talking to them about how they use their smartphones so we could get to know their smartphone behavior," explained Dave Woods, Chief Innovation Officer at ROHA, in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...
Video experiences should be unique and exciting! But that doesn’t mean you need to patch all the pieces yourself. Users demand rich and engaging experiences and new ways to connect with you. But creating robust video applications at scale can be complicated, time-consuming and expensive. In his session at @ThingsExpo, Zohar Babin, Vice President of Platform, Ecosystem and Community at Kaltura, discussed how VPaaS enables you to move fast, creating scalable video experiences that reach your aud...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...