|By Manuel Weiss||
|July 22, 2014 02:15 PM EDT||
Some of the major challenges today when building infrastructure are predictability, scalability and automated recovery. A predictable system will promote the exact same artifact that you tested into your production system so no intermittent failure can cause any trouble. A scalable system make it trivial, especially automatically, to deal with any rise in traffic. And automated recovery will make sure your team can focus on building a better product and sleep during the night instead of maintaining infrastructure constantly.
At Codeship we’ve found that an Infrastructure made up of immutable components has helped us tremendously with these goals.
Julian Dunn from Chef recently released a blog post about their stance on immutable infrastructure.
Chad Fowler summed it up very well in a tweet
@flomotlik pretty weak IMO. It conflates “containerisation” & “immutable infrastructure” then harps on a rigid definition of “immutable”
— CHad Fowler (@chadfowler)
Instead of going over every piece of the article, I want to present an overview of the experience we – and others – have had in making parts of our infrastructure immutable.
What is Immutable Infrastructure
Immutable infrastructure is comprised of immutable components that are replaced for every deployment, rather than being updated in-place. Those components are started from a common image that is built once per deployment and can be tested and validated. The common image can be built through automation, but doesn’t have to be. Immutability is independent of any tool or workflow for building the images.
Its best use case is in a cloud or virtualized environment. While it’s possible in non-virtualized environments, the benefit doesn’t outweigh the effort.
The main criticism against immutable infrastructure – as stated in the Chef blog post – is that there is always state somewhere in the system and, therefore, the whole system isn’t immutable. That misses the point of immutable components. The main advantage when it comes to state in immutable infrastructure is that it is siloed. The boundaries between layers storing state and the layers that are ephemeral are clearly drawn and no leakage can possibly happen between those layers. There simply is no way to mix state into different components when you can’t expect them to be up and running the next minute.
Atomic Deployments and Validation
Updating an existing server can easily have unintended consequences. That’s why Chef, Puppet, CFEngine or other such tools exist – to take care of consistency across your infrastructure. A central system is necessary to manage the expected state of each server and to take action to ensure compliance. Deployment is not an atomic action but a transition that can go wrong and lead to an unknown state. This becomes very hard and complex to debug, as the exact state you are in is hard to know. Chef, Puppet or CFEngine are very complex systems as they have to deal with an overly complex problem.
Another solution to that problem is to build completely new images and servers that contain the application and the environment every time you want to deploy. In that case, the deployment doesn’t depend on the status the servers were in before, so the result is much more predictable and repeatable. Any third-party issues that may cause the deployment to fail can be caught by validating the new image and ensuring no production system was impacted. This one image can then be used to start any number of servers and switch atomically from the old machines to the new ones by changing the load balancer, for example.
There are of course downsides to rebuilding your images with every deployment. A full rebuild of the system takes a lot longer than simply updating and restarting the application. By layering your deployment you can optimize this, e.g. have a repository to build a base image and use that base image to just put in your application for the deployment image, but it will still be a slower process.
Another problem is that you introduce dependencies to third parties during deployment. If you install packages in the system and your apt repository is slow or down this can fail the deployment. While this could be a problem in a non immutable infrastructure as well you typically interact less with third party systems when you just push new code into an already provisioned system.
By deploying from a pre-provisioned base image and updating that base image regularly you can soften that problem, but it’s still there and might fail a deployment from time to time.
Building the automation currently still takes more time at the beginning of the project, as the tools for building immutable infrastructure are still new or need to be developed. It is definitely more investment in the beginning, but pays off immediately.
You can still use Chef, Puppet, CFEngine or Ansible to build your images, but as they aren’t built for an immutable infrastructure workflow they tend to be more complex than necessary.
Fast Recovery by preserving History
As all deployments are done by building new images, history is preserved automatically for rollback when necessary. The same process and automation that is used to deploy the next version can be used to roll back, which ensures the process of rolling back will work. By automating the creation of the images, you can even recreate historical images and branch off from earlier points in the history of the infrastructure.
Data schema changes are a potential problem, but that’s a general issue with rollbacks. Backwards compatibility and zero downtime deployments are a way to make sure this will work regardless of the changes.
As you control the whole environment and application, any experiments with new versions of the language, operating system or dependencies are easy. With strict testing and validation in place, and the ability to roll-back if necessary, all the fear of upgrading any dependency is removed. Experimentation becomes an integral and trivial part of building your infrastructure.
Makes you collect your logs and metrics in a central location
With immutable components in place, it’s easy to simply kill a misbehaving server. While often errors are simply a product of the environment, for example a third party system misbehaving, and can be ignored, some will keep coming up. Not having access into the servers puts the right incentive on the team to collect and store logs and system metrics externally. This way, debugging can happen while the server is long gone.
If logs and metrics are missing to properly debug an issue, it’s easy to add more data collection to the infrastructure and replace all existing servers. Then once the error comes up again you can debug it fully from the data stored on an external system.
Immutable components as part of your infrastructure are a way to reduce inconsistency in your infrastructure and improve the trust into your deployment process. Atomic deployments, combined with validation of the image and easy rollback, make managing your infrastructure a lot easier.
It forces teams to silo data and expect failures that are inherent when building on top of a cloud infrastructure or when building systems in general. This increases resilience and trains you in a process to withstand any problems, especially in an automated fashion. Furthermore, it helps with building simple and independent components that are easy to deploy and scale.
And it’s not a theoretical idea. At Codeship, we’ve built our infrastructure this way for a long time. Heroku and other PaaS providers are built as immutable components and lots of companies – small and very large – have used immutability as a core concept of their infrastructure.
Tools like Packer have made building immutable components very easy. Together with existing cloud infrastructure they are a powerful concept to help you build better and safer infrastructure. Let me know in the comments if you have any questions or interesting insights to share.
I got great feedback by the following people on this article. Thanks for taking the time and helping me to make it much clearer and simply better.
- Chad Fowler, https://twitter.com/chadfowler
- Mitchell Hashimoto, https://twitter.com/mitchellh
- Evan Cooke, https://twitter.com/emcooke
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
Apr. 27, 2017 01:45 PM EDT Reads: 1,281
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Apr. 27, 2017 01:15 PM EDT Reads: 1,437
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
Apr. 27, 2017 01:15 PM EDT Reads: 1,574
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
Apr. 27, 2017 12:45 PM EDT Reads: 1,283
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
Apr. 27, 2017 12:30 PM EDT Reads: 1,205
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
Apr. 27, 2017 12:30 PM EDT Reads: 1,037
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
Apr. 27, 2017 12:15 PM EDT Reads: 3,134
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Apr. 27, 2017 12:15 PM EDT Reads: 1,208
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Apr. 27, 2017 12:00 PM EDT Reads: 1,309
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Apr. 27, 2017 10:45 AM EDT Reads: 2,201
Apr. 27, 2017 10:15 AM EDT Reads: 2,417
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
Apr. 27, 2017 09:15 AM EDT Reads: 704
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
Apr. 27, 2017 08:45 AM EDT Reads: 711
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Apr. 27, 2017 08:00 AM EDT Reads: 1,004
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search an...
Apr. 27, 2017 08:00 AM EDT Reads: 634
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
Apr. 27, 2017 07:00 AM EDT Reads: 1,096
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
Apr. 27, 2017 05:30 AM EDT Reads: 1,233
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
Apr. 27, 2017 05:00 AM EDT Reads: 2,303
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
Apr. 27, 2017 04:45 AM EDT Reads: 5,457
Judith Hurwitz is president and CEO of Hurwitz & Associates, a Needham, Mass., research and consulting firm focused on emerging technology, including big data, cognitive computing and governance. She is co-author of the book Cognitive Computing and Big Data Analytics, published in 2015. Her Cloud Expo session, "What Is the Business Imperative for Cognitive Computing?" is scheduled for Wednesday, June 8, at 8:40 a.m. In it, she puts cognitive computing into perspective with its value to the busin...
Apr. 27, 2017 03:45 AM EDT Reads: 3,725