Welcome!

Agile Computing Authors: Liz McMillan, Dana Gardner, Gil Allouche, AppNeta Blog, Elizabeth White

Related Topics: @BigDataExpo, Java IoT, Linux Containers, Agile Computing, @CloudExpo, SDN Journal

@BigDataExpo: Blog Feed Post

Scaling Big Data Fabrics

The size of the network might be the least interesting aspect of scaling Big Data fabrics

When people talk about Big Data, the emphasis is usually on the Big. Certainly, Big Data applications are distributed largely because the size of the data on which computations are executed warrants more than a typical application can handle. But scaling the network that provides connectivity between Big Data nodes is not just about creating massive interconnects.

In fact, the size of the network might be the least interesting aspect of scaling Big Data fabrics.

Just how big is Big Data?

Not that long ago, I asked the question: how large is a typical Big Data deployment? I was expecting, as I suspect many people are, that the Big in the title meant that the deployments would be, in a word, big. But the average Big Data deployment is actually far smaller than most people realize. I grabbed a list from HadoopWizard in an article dating back to last year.

What is remarkable about this list is just how unremarkable the sizes of the deployments are. Sure, the list is dated, and deployments have certainly gotten larger. And yes, companies like Yahoo! are pushing scaling limits. But the average deployment if you take Yahoo! out is a mere 113 nodes. Even if every node is multi-homed to two switches, this means the average deployment could be handled by 4 access switches.

Even if every deployment quadrupled, you would still only be talking about 16-access-switch deployments. When our industry talks about scaling, we usually think well beyond 16 switches.

Is scaling an issue?

So if deployments are small, does that mean scaling is a solved issue? The answer is both yes and no. If the end game is building individual networks for each Big Data application, then yes. While the web scale companies will always need more, the vast majority of customers will be well-served by the scaling limits that are around today.

But the issue with Big Data is that it isn’t really just Big Data. When we talk about Big Data, we usually ought to be using a different moniker. For most people, Big Data is less about Hadoop and more about clustered applications (at least so far as the network is concerned). By expanding the definition to clustered applications, you move past Hadoop and into clustered compute and even clustered storage environments. Anything clustered has a dependency on some kind of interconnect.

The challenge in clustered environments

The challenge of all these types of clustered environments is that their requirements vary. For Hadoop, job completion times are dominated by the compute side of things, so the network is really about providing a congestion-free interconnect that is always available. For clustered compute, latency might be more important. And for multi-tenant environments, it might be most important to isolate traffic. Whatever the application, the point is that the requirements are highly contextual.

Which brings us back to scaling.

The real issue in scaling Big Data fabrics is less about making a small interconnect larger. Networks are not going to scale along the lines of single applications (or at least they shouldn’t). The actual scaling challenge is plotting a course from a single Big Data application to an environment that hosts multiple clustered applications, each with different requirements.

This might seem dead simple, but it isn’t. When people deploy Big Data applications today, the Big part leads people to purpose-build architecture with massive data workloads in mind. In many cases, this includes building out separate networks aimed at specific workloads.

But even in the best cases, Hadoop makes use of things like rack awareness, which help provide application resilience while minimizing traffic across the network. Regardless of whether you view this as for the application or for the network, the result is that proximity and locality are built into the infrastructure. This creates interesting considerations (and potentially limitations) when expanding. If you want to grow a cluster, you can’t just use any available server in the datacenter; there are servers that are more preferable than others based solely on their physical location.

Scalability is more than scaling

Making a scalable interconnect for these types of clustered applications is more than just supporting a large (or as I mentioned previously, not so large) number of nodes. The objective for scalability is to provide a graceful path from start to finish. This means architectures need to consider not just what the ending state is but also how to get from here to there.

With Hadoop, this means that things like locality have to be an explicit consideration in architecting the interconnect. Is the right answer a bunch of cross-connects zigzagging across the datacenter? Maybe. Or it might be a different architectural approach to providing interconnect between clustered servers.

Additionally, it isn’t just about one application. Architecting for bandwidth because you have a Hadoop-y application is great, but what if the next clustered application is latency-sensitive? Or if it brings with it a set of auditing and compliance requirements more typical of HIPAA-style applications?

If the architecture doesn’t explicitly consider how to expand beyond a single application, even if it can grow to thousands of switches, it won’t really matter.

The bottom line

The punch line here is that scaling is not only about growing larger. It also means potentially growing more diverse. And if there is one thing that the Hadoop deployment numbers tell me, it’s that people are still experimenting. If you are still experimenting, how can you predict with certainty what the next 5 or 10 years will mean in terms of applications for your business? You can’t. Which means that the most important architectural objective might go well beyond the number of switches in a deployment. Scalability could be about building flexibility into you datacenter. How do you get a bunch of different purpose-built capabilities into a single, general-purpose network? Answering that might be the real key to determining how to scale Big Data fabrics.

[Today’s fun fact: It is against the law to use the Star Spangled Banner as dance music in Massachusetts. There go my party plans!]

The post Scaling Big Data fabrics appeared first on Plexxi.

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@ThingsExpo Stories
"There's a growing demand from users for things to be faster. When you think about all the transactions or interactions users will have with your product and everything that is between those transactions and interactions - what drives us at Catchpoint Systems is the idea to measure that and to analyze it," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York Ci...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Is your aging software platform suffering from technical debt while the market changes and demands new solutions at a faster clip? It’s a bold move, but you might consider walking away from your core platform and starting fresh. ReadyTalk did exactly that. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue and over a decade of audio conferencing product development to start an innovati...
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
It’s 2016: buildings are smart, connected and the IoT is fundamentally altering how control and operating systems work and speak to each other. Platforms across the enterprise are networked via inexpensive sensors to collect massive amounts of data for analytics, information management, and insights that can be used to continuously improve operations. In his session at @ThingsExpo, Brian Chemel, Co-Founder and CTO of Digital Lumens, will explore: The benefits sensor-networked systems bring to ...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
There will be new vendors providing applications, middleware, and connected devices to support the thriving IoT ecosystem. This essentially means that electronic device manufacturers will also be in the software business. Many will be new to building embedded software or robust software. This creates an increased importance on software quality, particularly within the Industrial Internet of Things where business-critical applications are becoming dependent on products controlled by software. Qua...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Large scale deployments present unique planning challenges, system commissioning hurdles between IT and OT and demand careful system hand-off orchestration. In his session at @ThingsExpo, Jeff Smith, Senior Director and a founding member of Incenergy, will discuss some of the key tactics to ensure delivery success based on his experience of the last two years deploying Industrial IoT systems across four continents.
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develo...
SYS-CON Events announced today that MangoApps will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device.
The IETF draft standard for M2M certificates is a security solution specifically designed for the demanding needs of IoT/M2M applications. In his session at @ThingsExpo, Brian Romansky, VP of Strategic Technology at TrustPoint Innovation, explained how M2M certificates can efficiently enable confidentiality, integrity, and authenticity on highly constrained devices.
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
In today's uber-connected, consumer-centric, cloud-enabled, insights-driven, multi-device, global world, the focus of solutions has shifted from the product that is sold to the person who is buying the product or service. Enterprises have rebranded their business around the consumers of their products. The buyer is the person and the focus is not on the offering. The person is connected through multiple devices, wearables, at home, on the road, and in multiple locations, sometimes simultaneously...
“delaPlex Software provides software outsourcing services. We have a hybrid model where we have onshore developers and project managers that we can place anywhere in the U.S. or in Europe,” explained Manish Sachdeva, CEO at delaPlex Software, in this SYS-CON.tv interview at @ThingsExpo, held June 7-9, 2016, at the Javits Center in New York City, NY.
"We've discovered that after shows 80% if leads that people get, 80% of the conversations end up on the show floor, meaning people forget about it, people forget who they talk to, people forget that there are actual business opportunities to be had here so we try to help out and keep the conversations going," explained Jeff Mesnik, Founder and President of ContentMX, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
From wearable activity trackers to fantasy e-sports, data and technology are transforming the way athletes train for the game and fans engage with their teams. In his session at @ThingsExpo, will present key data findings from leading sports organizations San Francisco 49ers, Orlando Magic NBA team. By utilizing data analytics these sports orgs have recognized new revenue streams, doubled its fan base and streamlined costs at its stadiums. John Paul is the CEO and Founder of VenueNext. Prior ...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.