|By Barry Morris||
|December 13, 2013 08:00 AM EST||
In my first post in this three part series I talked about the need for distributed transactional databases that scale-out horizontally across commodity machines, as compared to traditional transactional databases that employ a "scale-up" design. Simply adding more machines is a quicker, cheaper and more flexible way of increasing database capacity than forklift upgrades to giant steam-belching servers. It also brings the promise of continuous availability and of geo-distributed operation.
The second post in this series provided an overview of the three historical approaches to designing distributed transactional database systems, namely: 1. Shared Disk Designs (e.g., ORACLE RAC); 2. Shared Nothing Designs (e.g. the Facebook MySQL implementation); and 3) Synchronous Commit Designs (e.g. GOOGLE F1). All of them have some advantages over traditional client-server database systems, but they each have serious limitations in relation to cost, complexity, dependencies on specialized infrastructure, and workload-specific performance trade-offs. I noted that we are very excited about a recent innovation in distributed database design, introduced by NuoDB's technical founder Jim Starkey. We call the concept Durable Distributed Cache (DDC), and I want to spend a little time in this third and final post talking about what it is, with a high-level overview of how it works.
Memory-Centric vs. Storage-Centric
The first insight Jim had was that all general-purpose relational databases to-date have been architected around a storage-centric assumption, and that this is a fundamental problem when it comes to scaling out. In effect, database systems have been fancy file systems that arrange for concurrent read/write access to disk-based files such that users do not trample on each other. The Durable Distributed Cache architecture inverts that idea, imagining the database as a set of in-memory container objects that can overflow to disk if necessary, and can be retained in backing stores for durability purposes. Memory-Centric vs. Storage-Centric may sound like splitting hairs, but it turns out that it is really significant. The reasons are best described by example.
Suppose you have a single, logical DDC database running on 50 servers (which is absolutely feasible to do with an ACID transactional DDC-based database). And suppose that at some moment server 23 needs object #17. In this case, server 23 might determine that object #17 is instantiated in memory on seven other servers. It simply requests the object from the most responsive server. Note that as the object was in memory, the operation involved no disk IO - it was a remote memory fetch, which is orders of magnitude faster than going to disk.
You might ask about the case in which object #17 does not exist in memory elsewhere. In the Durable Distributed Cache architecture this is handled by certain servers "faking" that they have all the objects in memory. In practice, of course, they are maintaining backing stores on disk, SSD or whatever they choose (in the NuoDB implementation they can use arbitrary Key/Value stores such as Amazon S3 or Hadoop HDFS). As it relates to supplying objects, these "backing store servers" behave exactly like the other servers except they can't guarantee the same response times.
So all servers in the DDC architecture can request objects and supply objects. They are peers in that sense (and in all other senses). Some servers have a subset of the objects at any given time, and can therefore only supply a subset of the database to other servers. Other servers have all the objects and can supply any of them, but will be slower to supply objects that are not resident in memory.
Let's call the servers with a subset of the objects Transaction Engines (TEs), and the other servers Storage Managers (SMs). TEs are pure in memory servers that do not need to use disks. They are autonomous and can unilaterally load and eject objects from memory according to their needs. Unlike TEs, SMs can't just drop objects on the floor when they are finished with them; instead they must ensure they are safely placed in durable storage.
For readers familiar with caching architectures, you might have already recognized that these TEs are in effect a distributed DRAM cache, and the SMs are specialized TEs that ensure durability. Hence the name Durable Distributed Cache.
Resilience to Failure
It turns out that any object state that is present on a TE is either already committed to the disk (i.e. safe on one or more SMs) or part of an uncommitted transaction that will simply fail at application level if the object goes away. This means that the database has the interesting property of being resilient to the loss of TEs. You can shut a TE down or just unplug it and the system does not lose data. It will lose throughput capacity of course, and any partial transactions on the TE will be reported to the application as failed transactions. But transactional applications are designed to handle transaction failure. If you reissue the transaction at the application level it will be assigned to a different TE and will proceed to completion. So the DDC architecture is resilient to the loss of TEs.
What about SMs? Recall that you can have as many SMs as you like. They are effectively just TEs that secretly stash away the objects in some durable store. And, unless you configure it not to, each SM might as well store all the objects. Disks are cheap, which means that you have as many redundant copies of the whole database as you want. In consequence, the DDC architecture is not only resilient to the loss of TEs, but also to the loss of SMs.
In fact, as long as you have at least one TE and one SM running, you still have a running database. Resilience to failure is one of the longstanding but unfulfilled promises of distributed transactional databases. The DDC architecture addresses this directly.
Elastic Scale-out and Scale-in
What happens if I add a server to a DDC database? Think of the TE layer as a cache. If the new TE is given work to do, it will start asking for objects and doing the assigned work. It will also start serving objects to other TEs that need them. In fact, the new TE is a true peer of the other TEs. Furthermore, if you were to shut down all of the other TEs, the database would still be running, and the new TE would be the only server doing transactional work.
SMs, being specialized TEs, can also be added and shut down dynamically. If you add an "empty" (or stale) SM to a running database, it will cycle through the list of objects and load them into its durable store, fetching them from the most responsive place as is usual. Once it has all the objects, it will raise its hand and take part as a peer to the other SMs. And, just as with the new TE described above, you can delete all other SMs once you have added the new SM. The system will keep running without missing a beat or losing any data.
So the bottom line is that the DDC architecture delivers capacity on demand. You can elastically scale-out the number of TEs and SMs and scale them back in again according to workload requirements. Capacity on demand is a second promise of distributed databases that is delivered by the DDC architecture.
The astute reader will no doubt be wondering about the hardest part of this distributed database problem -- namely that we are talking about distributed transactional databases. Transactions, specifically ACID transactions, are an enormously simplifying abstraction that allows application programmers to build their applications with very clean, high-level and well-defined data guarantees. If I store my data in an ACID transactional database, I know it will isolate my program from other programs, maintain data consistency, avoid partial failure of state changes and guarantee that stored data will still be there at a later date, irrespective of external factors. Application programs are vastly simpler when they can trust an ACID compliant database to look after their data, whatever the weather.
The DDC architecture adopts a model of append-only updates. Traditionally, an update to a record in a database overwrites that record, and a deletion of a record removes the record. That may sound obvious, but there is another way, invented by Jim Starkey about 25 years ago. The idea is to create and maintain versions of everything. In this model, you never do a destructive update or destructive delete. You only ever create new versions of records, and in the case of a delete, the new version is a record version that notes the record is no longer extant. This model is called MVCC (multi-version concurrency control), and it has a number of well-known benefits, even in scale-up databases. MVCC has even greater benefits in distributed database architectures, including DDC.
We don't have the space here to cover MVCC in detail, but it is worth noting that one of the things it does is to allow a DBMS to manage read/write concurrency without the use of traditional locks. For example, readers don't block writers and writers do not block readers. It also has some exotic features, including that if you wanted to you could theoretically maintain a full history of the entire database. But as it relates to DDC and the Distributed Transactional Database challenge, there is something very neat about MVCC. DDC leverages a distributed variety of MVCC in concert with DDC's distributed object semantics that allows almost all the inter-server communications to be asynchronous.
The implications of DDC being asynchronous are very far-reaching. In general, it allows much higher utilization of system resources (cores, networks, disks, etc.) than synchronous models can. But specifically, it allows the system to be fairly insensitive to network latencies, and to the location of the servers relative to each other. Or to put it a different way, it means you can start up your next TE or SM in a remote datacenter and connect it to the running database. Or you can start up half of the database servers in your datacenter and the other half on a public cloud.
Modern applications are distributed. Users of a particular web site are usually spread across the globe. Mobile applications are geo-distributed by nature. Internet of Things (IoT) applications are connecting gazillions of consumer devices that could be anywhere at any time. None of these applications are well served by a single big database server in a single location, or even a cluster of smaller database servers in a single location. What they need is a single database running on a group of database servers in multiple datacenters (or cloud regions). That can give them higher performance, datacenter failover and the potential to manage issues of data privacy and sovereignty.
The third historical promise of Distributed Transactional Database systems is Geo-Distribution. Along with the other major promises (Resilience to Failure and Elastic Scalability), Geo-Distribution has heretofore been an unattainable dream. The DDC architecture, with its memory-centric distributed object model and its asynchronous inter-server protocols, finally delivers on this capability.
This short series of posts has sought to provide a quick overview of distributed database designs, with some high level commentary on the advantages and disadvantages of the various approaches. There has been great historical success with Shared Disk, Shared Nothing and Synchronous Commit models. We see the advanced technology companies delivering some of the most scalable systems in the world using these distributed database technologies. But to date, distributed databases have never really delivered anything close to their full promise. They have also been inaccessible to people and organizations that lack the development and financial resources of GOOGLE or Facebook.
With the advent of DDC architectures, it is now possible for any organization to build global systems with transactional semantics, on-demand capacity and the ability to run for 10 years without missing a beat. The big promises of Distributed Transactional Databases are Elastic Scalability and Geo-Distribution. We're very excited that due to Jim Starkey's Durable Distributed Cache, those capabilities are finally being delivered to the industry.
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
Aug. 1, 2015 04:45 PM EDT Reads: 481
For IoT to grow as quickly as analyst firms’ project, a lot is going to fall on developers to quickly bring applications to market. But the lack of a standard development platform threatens to slow growth and make application development more time consuming and costly, much like we’ve seen in the mobile space. In his session at @ThingsExpo, Mike Weiner, Product Manager of the Omega DevCloud with KORE Telematics Inc., discussed the evolving requirements for developers as IoT matures and conducted a live demonstration of how quickly application development can happen when the need to comply wit...
Aug. 1, 2015 03:15 PM EDT Reads: 329
The Internet of Everything (IoE) brings together people, process, data and things to make networked connections more relevant and valuable than ever before – transforming information into knowledge and knowledge into wisdom. IoE creates new capabilities, richer experiences, and unprecedented opportunities to improve business and government operations, decision making and mission support capabilities.
Aug. 1, 2015 10:00 AM EDT Reads: 296
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Architect for the Internet of Things and Intelligent Systems, described how to revolutionize your archit...
Jul. 30, 2015 07:30 PM EDT Reads: 1,413
MuleSoft has announced the findings of its 2015 Connectivity Benchmark Report on the adoption and business impact of APIs. The findings suggest traditional businesses are quickly evolving into "composable enterprises" built out of hundreds of connected software services, applications and devices. Most are embracing the Internet of Things (IoT) and microservices technologies like Docker. A majority are integrating wearables, like smart watches, and more than half plan to generate revenue with APIs within the next year.
Jul. 30, 2015 02:30 PM EDT Reads: 127
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Opening Keynote at 16th Cloud Expo, Sandy Carter, IBM General Manager Cloud Ecosystem and Developers, and a Social Business Evangelist, d...
Jul. 30, 2015 12:00 PM EDT Reads: 2,069
In his keynote at 16th Cloud Expo, Rodney Rogers, CEO of Virtustream, discussed the evolution of the company from inception to its recent acquisition by EMC – including personal insights, lessons learned (and some WTF moments) along the way. Learn how Virtustream’s unique approach of combining the economics and elasticity of the consumer cloud model with proper performance, application automation and security into a platform became a breakout success with enterprise customers and a natural fit for the EMC Federation.
Jul. 30, 2015 09:00 AM EDT Reads: 2,170
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists addressed this very serious issue of profound change in the industry.
Jul. 29, 2015 03:00 PM EDT Reads: 1,287
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect their organization.
Jul. 29, 2015 02:00 PM EDT Reads: 1,194
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society-changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his session at @ThingsExpo, Jason Mondanaro, Director, Product Management at Metanga, discussed how you can plan to cooperate, partner, and form lasting all-star teams to change the world and it starts with business models and monetization strategies.
Jul. 28, 2015 04:30 PM EDT Reads: 1,772
Converging digital disruptions is creating a major sea change - Cisco calls this the Internet of Everything (IoE). IoE is the network connection of People, Process, Data and Things, fueled by Cloud, Mobile, Social, Analytics and Security, and it represents a $19Trillion value-at-stake over the next 10 years. In her keynote at @ThingsExpo, Manjula Talreja, VP of Cisco Consulting Services, discussed IoE and the enormous opportunities it provides to public and private firms alike. She will share what businesses must do to thrive in the IoE economy, citing examples from several industry sectors.
Jul. 28, 2015 11:00 AM EDT Reads: 2,047
There will be 150 billion connected devices by 2020. New digital businesses have already disrupted value chains across every industry. APIs are at the center of the digital business. You need to understand what assets you have that can be exposed digitally, what their digital value chain is, and how to create an effective business model around that value chain to compete in this economy. No enterprise can be complacent and not engage in the digital economy. Learn how to be the disruptor and not the disruptee.
Jul. 27, 2015 10:00 AM EDT Reads: 2,038
Akana has released Envision, an enhanced API analytics platform that helps enterprises mine critical insights across their digital eco-systems, understand their customers and partners and offer value-added personalized services. “In today’s digital economy, data-driven insights are proving to be a key differentiator for businesses. Understanding the data that is being tunneled through their APIs and how it can be used to optimize their business and operations is of paramount importance,” said Alistair Farquharson, CTO of Akana.
Jul. 27, 2015 09:00 AM EDT Reads: 331
Business as usual for IT is evolving into a "Make or Buy" decision on a service-by-service conversation with input from the LOBs. How does your organization move forward with cloud? In his general session at 16th Cloud Expo, Paul Maravei, Regional Sales Manager, Hybrid Cloud and Managed Services at Cisco, discusses how Cisco and its partners offer a market-leading portfolio and ecosystem of cloud infrastructure and application services that allow you to uniquely and securely combine cloud business applications and services across multiple cloud delivery models.
Jul. 27, 2015 08:00 AM EDT Reads: 1,907
The enterprise market will drive IoT device adoption over the next five years. In his session at @ThingsExpo, John Greenough, an analyst at BI Intelligence, division of Business Insider, analyzed how companies will adopt IoT products and the associated cost of adopting those products. John Greenough is the lead analyst covering the Internet of Things for BI Intelligence- Business Insider’s paid research service. Numerous IoT companies have cited his analysis of the IoT. Prior to joining BI Intelligence, he worked analyzing bank technology for Corporate Insight and The Clearing House Payment...
Jul. 26, 2015 09:00 PM EDT Reads: 1,581
"Optimal Design is a technology integration and product development firm that specializes in connecting devices to the cloud," stated Joe Wascow, Co-Founder & CMO of Optimal Design, in this SYS-CON.tv interview at @ThingsExpo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 25, 2015 02:00 PM EDT Reads: 400
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and simplified management of data on complex storage networks. CommVault's exclusive single-platform architecture gives companies unp...
Jul. 25, 2015 01:00 PM EDT Reads: 1,966
Electric Cloud and Arynga have announced a product integration partnership that will bring Continuous Delivery solutions to the automotive Internet-of-Things (IoT) market. The joint solution will help automotive manufacturers, OEMs and system integrators adopt DevOps automation and Continuous Delivery practices that reduce software build and release cycle times within the complex and specific parameters of embedded and IoT software systems.
Jul. 25, 2015 12:15 PM EDT Reads: 482
"ciqada is a combined platform of hardware modules and server products that lets people take their existing devices or new devices and lets them be accessible over the Internet for their users," noted Geoff Engelstein of ciqada, a division of Mars International, in this SYS-CON.tv interview at @ThingsExpo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 25, 2015 12:00 PM EDT Reads: 1,544
Internet of Things is moving from being a hype to a reality. Experts estimate that internet connected cars will grow to 152 million, while over 100 million internet connected wireless light bulbs and lamps will be operational by 2020. These and many other intriguing statistics highlight the importance of Internet powered devices and how market penetration is going to multiply many times over in the next few years.
Jul. 25, 2015 09:00 AM EDT Reads: 1,494