|By Peter Lubbers, Frank Greco||
|March 11, 2010 10:15 AM EST||
Lately there has been a lot of buzz around HTML5 Web Sockets, which defines a full-duplex communication channel that operates through a single socket over the Web. HTML5 Web Sockets is not just another incremental enhancement to conventional HTTP communications; it represents a colossal advance, especially for real-time, event-driven web applications.
HTML5 Web Sockets provides such a dramatic improvement from the old, convoluted "hacks" that are used to simulate a full-duplex connection in a browser that it prompted Google's Ian Hickson - the HTML5 specification lead - to say:
"Reducing kilobytes of data to 2 bytes...and reducing latency from 150ms to 50ms is far more than marginal. In fact, these two factors alone are enough to make Web Sockets seriously interesting to Google."
Let's look at how HTML5 Web Sockets can offer such an incredibly dramatic reduction of unnecessary network traffic and latency by comparing it to conventional solutions.
Polling, Long-Polling, and Streaming - Headache 2.0
Normally when a browser visits a web page, an HTTP request is sent to the web server that hosts that page. The web server acknowledges this request and sends back the response. In many cases - for example, for stock prices, news reports, ticket sales, traffic patterns, medical device readings, and so on - the response could be stale by the time the browser renders the page. If you want to get the most up-to-date "real-time" information, you can constantly refresh that page manually, but that's obviously not a great solution.
With polling, the browser sends HTTP requests at regular intervals and immediately receives a response. This technique was the first attempt for the browser to deliver real-time information. Obviously, this is a good solution if the exact interval of message delivery is known, because you can synchronize the client request to occur only when information is available on the server. However, real-time data is often not that predictable, making unnecessary requests inevitable and, as a result, many connections are opened and closed needlessly in low-message-rate situations.
With long-polling, the browser sends a request to the server and the server keeps the request open for a set period. If a notification is received within that period, a response containing the message is sent to the client. If a notification is not received within the set time period, the server sends a response to terminate the open request. It is important to understand, however, that when you have a high message volume, long-polling does not provide any substantial performance improvements over traditional polling. In fact, it could be worse, because the long-polling might spin out of control into an unthrottled, continuous loop of immediate polls.
With streaming, the browser sends a complete request, but the server sends and maintains an open response that is continuously updated and kept open indefinitely (or for a set period of time). The response is then updated whenever a message is ready to be sent, but the server never signals to complete the response, thus keeping the connection open to deliver future messages. However, since streaming is still encapsulated in HTTP, intervening firewalls and proxy servers may choose to buffer the response, increasing the latency of the message delivery. Therefore, many streaming Comet solutions fall back to long-polling in case a buffering proxy server is detected. Alternatively, TLS (SSL) connections can be used to shield the response from being buffered, but in that case the setup and tear down of each connection taxes the available server resources more heavily.
Ultimately, all of these methods for providing real-time data involve HTTP request and response headers, which contain lots of additional, unnecessary header data and introduce latency. On top of that, full-duplex connectivity requires more than just the downstream connection from server to client. In an effort to simulate full-duplex communication over half-duplex HTTP, many of today's solutions use two connections: one for the downstream and one for the upstream. The maintenance and coordination of these two connections introduces significant overhead in terms of resource consumption and adds lots of complexity. Simply put, HTTP wasn't designed for real-time, full-duplex communication as you can see in Figure 1, which shows the complexities associated with building a Comet web application that displays real-time data from a back-end data source using a publish/subscribe model over half-duplex HTTP.
Figure 1: The complexity of Comet applications
It gets even worse when you try to scale out those Comet solutions to the masses. Simulating bi-directional browser communication over HTTP is error-prone and complex and all that complexity does not scale. Even though your end users might be enjoying something that looks like a real-time web application, this "real-time" experience has an outrageously high price tag. It's a price that you will pay in additional latency, unnecessary network traffic, and a drag on CPU performance.
HTML5 Web Sockets to the Rescue!
Defined in the Communications section of the HTML5 specification, HTML5 Web Sockets represent the next evolution of web communications - a full-duplex, bidirectional communications channel that operates through a single socket over the Web. HTML5 Web Sockets provides a true standard that you can use to build scalable, real-time web applications. In addition, since it provides a socket that is native to the browser, it eliminates many of the problems Comet solutions are prone to. Web Sockets remove the overhead and dramatically reduce complexity.
To establish a WebSocket connection, the client and server upgrade from the HTTP protocol to the WebSocket protocol during their initial handshake, as shown in the following example:
Example 1: The WebSocket handshake (browser request and server response)
GET /text HTTP/1.1\r\n
HTTP/1.1 101 WebSocket Protocol Handshake\r\n
Once established, WebSocket data frames can be sent back and forth between the client and the server in full-duplex mode. Both text and binary frames can be sent full-duplex, in either direction at the same time. The data is minimally framed with just two bytes. In the case of text frames, each frame starts with a 0x00 byte, ends with a 0xFF byte, and contains UTF-8 data in between. WebSocket text frames use a terminator, while binary frames use a length prefix.
The Showdown: Comet vs HTML5 Web Sockets
How dramatic is that reduction in unnecessary network traffic and latency? Let's compare a polling application with a WebSocket application.
For the polling example, I created a simple web application in which a web page requests real-time stock data from a RabbitMQ message broker using a traditional publish/subscribe model. It does this by polling a Java servlet that is hosted on a web server. The RabbitMQ message broker receives data from a fictitious stock price feed with continuously updating prices. The web page connects and subscribes to a specific stock channel (a topic on the message broker) and uses an XMLHttpRequest to poll for updates once per second. When updates are received, some calculations are performed and the stock data is shown in a table as shown in Figure 2.
Note: The back-end stock feed actually produces a lot of stock price updates per second, so using polling at one-second intervals is actually more prudent than using a Comet long-polling solution, which would result in a series of continuous polls. Polling effectively throttles the incoming updates here.
It all looks great, but a look under the hood reveals there are some serious issues with this application. For example, in Mozilla Firefox with Firebug (a Firefox add-on that allows you to debug web pages and monitor the time it takes to load pages and execute scripts), you can see that GET requests hammer the server at one-second intervals. Turning on Live HTTP Headers (another Firefox add-on that shows live HTTP header traffic) reveals the shocking amount of header overhead that is associated with each request. The following two examples show the HTTP header data for just a single request and response:
Example 2: HTTP request header
GET /PollingStock//PollingStock HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/20091102 Firefox/3.5.5
Cookie: showInheritedConstant=false; showInheritedProtectedConstant=false; showInheritedProperty=false; showInheritedProtectedProperty=false; showInheritedMethod=false; showInheritedProtectedMethod=false; showInheritedEvent=false; showInheritedStyle=false; showInheritedEffect=false
Example 3: HTTP response header
HTTP/1.x 200 OK
Server: Sun Java System Application Server 9.1_02
Date: Sat, 07 Nov 2009 00:32:46 GMT
Just for fun, I counted all the characters. The total HTTP request and response header information overhead contains 871 bytes and that doesn't even include any data. Of course, this is just an example and you can have less than 871 bytes of header data, but I have also seen cases where the header data exceeded 2,000 bytes. In this example application, the data for a typical stock topic message is only about 20 characters long. As you can see, it's effectively drowned out by the excessive header information, which wasn't even required in the first place.
What happens when you deploy this application to a large number of users? Let's take a look at the network throughput for just the HTTP request and response header data associated with this polling application in three different use cases.
- Use case A: 1,000 clients polling every second:
Network throughput is (871 x 1,000) = 871,000 bytes = 6,968,000 bits per second (6.6 Mbps)
- Use case B: 10,000 clients polling every second:
Network throughput is (871 x 10,000) = 8,710,000 bytes = 69,680,000 bits per second (66 Mbps)
- Use case C: 100,000 clients polling every 1 second:
Network throughput is (871 x 100,000) = 87,100,000 bytes = 696,800,000 bits per second (665 Mbps)
That's an enormous amount of unnecessary network throughput. If only we could just get the essential data over the wire. Well, guess what? You can with HTML5 Web Sockets. I rebuilt the application to use HTML5 Web Sockets, adding an event handler to the web page to asynchronously listen for stock update messages from the message broker (check out the many how-tos and tutorials on www.tech.kaazing.com/documentation for more information on how to build a WebSocket application). Each of these messages is a WebSocket frame that has just two bytes of overhead (instead of 871). Take a look at how that affects the network throughput overhead in our three use cases.
- Use case A: 1,000 clients receive 1 message per second:
Network throughput is (2 x 1,000) = 2,000 bytes = 16,000 bits per second (0.015 Mbps)
- Use case B: 10,000 clients receive 1 message per second:
Network throughput is (2 x 10,000) = 20,000 bytes = 160,000 bits per second (0.153 Mbps)
- Use case C: 100,000 clients receive 1 message per second:
Network throughput is (2 x 100,000) = 200,000 bytes = 1,600,000 bits per second (1.526 Mbps)
As you can see in Figure 3, HTML5 Web Sockets provide a dramatic reduction of unnecessary network traffic compared to the polling solution.
Figure 3: Comparison of the unnecessary network throughput overhead between the polling and the WebSocket applications
What about the reduction in latency? Take a look at Figure 4. In the top half, you can see the latency of the half-duplex polling solution. If we assume, for this example, that it takes 50 milliseconds for a message to travel from the server to the browser, then the polling application introduces a lot of extra latency, because a new request has to be sent to the server when the response is complete. This new request takes another 50ms and during this time the server cannot send any messages to the browser, resulting in additional server memory consumption.
In the bottom half of the figure, you see the reduction in latency provided by the WebSocket solution. Once the connection is upgraded to WebSocket, messages can flow from the server to the browser the moment they arrive. It still takes 50 ms for messages to travel from the server to the browser, but the WebSocket connection remains open so there is no need to send another request to the server.
Figure 4: Latency comparison between the polling and WebSocket applications
HTML5 Web Sockets and the Kaazing WebSocket Gateway
Today, only Google's Chrome browser supports HTML5 Web Sockets natively, but other browsers will soon follow. To work around that limitation, however, Kaazing WebSocket Gateway provides complete WebSocket emulation for all the older browsers (I.E. 5.5+, Firefox 1.5+, Safari 3.0+, and Opera 9.5+), so you can start using the HTML5 WebSocket APIs today.
WebSocket is great, but what you can do once you have a full-duplex socket connection available in your browser is even greater. To leverage the full power of HTML5 Web Sockets, Kaazing provides a ByteSocket library for binary communication and higher-level libraries for protocols like Stomp, AMQP, XMPP, IRC and more, built on top of WebSocket.
For example, if you use a high-level library for protocols such as Stomp or AMQP (supplied with the Kaazing Gateway), you can talk directly to back-end message brokers like the RabbitMQ broker described in the example. By having a direct connection to services, there is no need for additional application server logic to translate those bi-directional, full-duplex TCP backend protocols to uni-directional, half-duplex HTTP connections; the browser can simply understand these protocols natively.
Figure 5: Kaazing WebSocket Gateway extends TCP-based messaging to the browser with ultra high performance
HTML5 Web Sockets provides an enormous step forward in the scalability of the real-time web. As you have seen in this article, HTML5 Web Sockets can provide a 500:1 or - depending on the size of the HTTP headers - even a 1000:1 reduction in unnecessary HTTP header traffic and 3:1 reduction in latency. That is not just an incremental improvement; that is a revolutionary jump - a quantum leap.
Kaazing WebSocket Gateway makes HTML5 WebSocket code work in all the browsers today, while providing additional protocol libraries that allow you to harness the full power of the full-duplex socket connection that HTML5 Web Sockets provides and communicate directly to back-end services. For more information about Kaazing WebSocket Gateway, visit www.kaazing.com and the Kaazing technology network at tech.kaazing.com.
- HTML5 Web Sockets Specification: http://dev.w3.org/html5/websockets/
- WebSockets.org: http://www.websockets.org
- Kaazing corporation: http://www.kaazing.com
- Download the Kaazing WebSocket Gateway: http://www.kaazing.com/download
- Skills Matter HTML5 Communication Training Courses: http://skillsmatter.com/expert-profile/java-jee/peter-lubbers
- Book: Pro HTML5 Programming: Powerful APIs for Richer Internet Application Development (Link)
|PeterLubbers 03/15/10 03:55:00 PM EDT|
An additional problem with streaming is that you can get unexpected results if there are intermediaries such as buffering proxy servers in the way. Streaming solutions therefore often fall back to long-polling if they detect such intermediaries (if they are smart enough to detect them at all). With WebSocket and WebSocket secure you have a better chance of traversing proxy servers and firewalls.
|tbayes 03/11/10 10:11:00 AM EST|
Hi Peter and Frank,
Thank you for this clear and well-written article, which does much to effectively explain the benefits of the WebSocket protocol.
If I may point out one minor correction: In the WebSocket section of Example 3, Use case B and C should show overhead rates of 0.153Mbps and 1.526Mbps respectively, as opposed to 0.153Kbps and 1.526Kbps.
Perhaps a more natural comparison would be between Web Sockets and Comet-streaming (via, for example, the "Forever IFrame" technique), as I feel it a little unsporting to compare Web Sockets directly to polling or long-polling! At around 20 bytes per message, the overhead associated with clients receiving messages via Comet-style streaming is still more than WebSocket's 2 bytes per message, but is still one to two orders of magnitude less than the overhead associated with polling techniques. Similarly, latency whether receiving messages using Comet-streaming or WebSocket is the same. Of course, clients sending messages will gain the advantages you describe only with Web Sockets, as all other approaches do indeed require new HTTP requests.
I completely agree, however, that WebSocket is a definite improvement over the Comet IFrame-approaches, which have an air of string and glue about them.
I hope that we will see WebSocket support in all production mainstream browsers very soon.
Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is expected in the amount of information being processed, managed, analyzed, and acted upon by enterprise IT. This amazing is not part of some distant future - it is happening today. One report shows a 650% increase in enterprise data by 2020. Other estimates are even higher....
Jun. 26, 2016 05:00 PM EDT Reads: 1,301
SYS-CON Events announced today that Bsquare has been named “Silver Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.
Jun. 26, 2016 05:00 PM EDT Reads: 1,209
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...
Jun. 26, 2016 04:30 PM EDT Reads: 949
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Jun. 26, 2016 04:00 PM EDT Reads: 1,256
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Jun. 26, 2016 04:00 PM EDT Reads: 1,313
Connected devices and the industrial internet are growing exponentially every year with Cisco expecting 50 billion devices to be in operation by 2020. In this period of growth, location-based insights are becoming invaluable to many businesses as they adopt new connected technologies. Knowing when and where these devices connect from is critical for a number of scenarios in supply chain management, disaster management, emergency response, M2M, location marketing and more. In his session at @Th...
Jun. 26, 2016 03:00 PM EDT Reads: 977
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his Day 2 Keynote at @ThingsExpo, Henrik Kenani Dahlgren, Portfolio Marketing Manager at Ericsson, discussed how to plan to cooperate, partner, and form lasting all-star teams to change t...
Jun. 26, 2016 02:00 PM EDT Reads: 1,132
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Jun. 26, 2016 01:00 PM EDT Reads: 800
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
Jun. 26, 2016 12:00 PM EDT Reads: 1,289
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
Jun. 26, 2016 12:00 PM EDT Reads: 1,110
There is little doubt that Big Data solutions will have an increasing role in the Enterprise IT mainstream over time. Big Data at Cloud Expo - to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA - has announced its Call for Papers is open. Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is...
Jun. 26, 2016 12:00 PM EDT Reads: 1,348
In his general session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed cloud as a ‘better data center’ and how it adds new capacity (faster) and improves application availability (redundancy). The cloud is a ‘Dynamic Tool for Dynamic Apps’ and resource allocation is an integral part of your application architecture, so use only the resources you need and allocate /de-allocate resources on the fly.
Jun. 26, 2016 12:00 PM EDT Reads: 1,087
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
Jun. 26, 2016 11:00 AM EDT Reads: 1,115
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
Jun. 26, 2016 09:45 AM EDT Reads: 626
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
Jun. 26, 2016 08:45 AM EDT Reads: 748
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Jun. 25, 2016 03:00 PM EDT Reads: 1,618
SYS-CON Events announced today that ReadyTalk, a leading provider of online conferencing and webinar services, has been named Vendor Presentation Sponsor at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. ReadyTalk delivers audio and web conferencing services that inspire collaboration and enable the Future of Work for today’s increasingly digital and mobile workforce. By combining intuitive, innovative tec...
Jun. 24, 2016 01:00 PM EDT Reads: 1,362
Amazon has gradually rolled out parts of its IoT offerings, but these are just the tip of the iceberg. In addition to optimizing their backend AWS offerings, Amazon is laying the ground work to be a major force in IoT - especially in the connected home and office. In his session at @ThingsExpo, Chris Kocher, founder and managing director of Grey Heron, explained how Amazon is extending its reach to become a major force in IoT by building on its dominant cloud IoT platform, its Dash Button strat...
Jun. 24, 2016 12:00 PM EDT Reads: 1,654
industrial company for a multi-year contract initially valued at over $4.0 million. In addition to DataV software, Bsquare will also provide comprehensive systems integration, support and maintenance services. DataV leverages advanced data analytics, predictive reasoning, data-driven diagnostics, and automated orchestration of remediation actions in order to improve asset uptime while reducing service and warranty costs.
Jun. 22, 2016 11:00 AM EDT Reads: 1,371
Vidyo, Inc., has joined the Alliance for Open Media. The Alliance for Open Media is a non-profit organization working to define and develop media technologies that address the need for an open standard for video compression and delivery over the web. As a member of the Alliance, Vidyo will collaborate with industry leaders in pursuit of an open and royalty-free AOMedia Video codec, AV1. Vidyo’s contributions to the organization will bring to bear its long history of expertise in codec technolo...
Jun. 19, 2016 12:45 PM EDT Reads: 1,278