Welcome!

Agile Computing Authors: Liz McMillan, Yeshim Deniz, Elizabeth White, Zakia Bouachraoui, Pat Romanski

Related Topics: Agile Computing

Agile Computing: Article

i-Technology Viewpoint: How Amazon S3 is Going to Change the World

'Goodbye' scalability problems, 'Hello' unlimited storage

Web 2.0 Journal Contributing Editor Alex Iskold (pictured) writes:  We are observing the transformation of the web from an ecosystem into an operating system. Building blocks such as websites, blogs, web services, podcasts and RSS are coming together and give rise to a new computing platform. The web operating system is emerging and it is bigger than the sum of its parts.
 
Remember your operating systems class, when you learned that every operating system has a handful of fundamental concepts such as storage, virtual memory and scheduling? The new web is no exception. However, since the Internet is a gigantic network of computers working in parallel, the basic operating system concepts take on a different shape.
 
For example, when you try to save a file on your computer, there is a (rare) possibility that the disk is full, and the write will fail. But with the new web operating system, this does not have to be the case. If the disk of one computer becomes full, the web operating system can switch and store the file on another computer. So on the web, you practically never run out of space.
 
Amazon S3 – the new virtual storage service from Amazon
 
The virtual storage has been in the news for sometime now. Dion Hinchcliffe has written a survey of virtual storage providers in his recent post. He particularly commended the Amazon S3 – simple storage service for its innovative API.

In essence, the Amazon S3 offers developers a huge hashtable. The minimalistic API, available in both SOAP and REST, is focused on basic management of the objects – write, read and delete. By default, the service works over HTTP and supports storage of objects up to 5 gigabytes in size. There is also support for BitTorrent and a plan to add other protocols in the future. To use the service, you have to have an Amazon Web Services account.
 
Amazon has done a  very thorough job documenting and supporting the service. The resources page contains a wealth of useful information to get you going, most notably the API and the user forums. There are also code samples available in various languages that illustrate how to use the Amazon S3 API.
 
Storing and retrieving objects
 
The objects in S3 account are placed into buckets. Each account is allowed to have up to 100 buckets and the bucket name has to be unique across all S3 users. 


 


             Figure 1: Example from Amazon S3 API shows CREATE BUCKET request


Each object inside a bucket has to have a unique UTF-8 compliant key assigned by the developer. Since there is no specific key structure imposed by S3, the developers are free to do what best suits their needs. The documentation hints at using slashes to create directory-like structure, but does not insist on it. The lack of key specificity and directory interface in API is not a limitation, but an added flexibility, since people's needs might be different and implementing the directory-like storage is just a matter of following naming conventions in the code.


      Figure 2:  Example from Amazon S3 API shows GET BUCKET request
 
The API also allows the developer to list all the keys in a particular bucket. This is implemented using a flavor of the Iterator pattern and a concept of a marker. With the first query no marker is supplied. If the bucket contains more objects than specified in max-keys parameter, then the a marker for the next starting point is returned. To obtain the next set of results, the marker is passed back in the subsequent request.
 


Figure 3: Amazon S3 diagram

 
Since it might be expensive to fetch the entire object from S3, the API allows you associate the meta data with each object. The meta is returned together with the key when GET BUCKET operation is requested. This functionality is particularly handy for storing and looking up information about large media files.
 
S3 Security
 
S3 has a built in security model for both connecting to the service and setting the access policy for  individual objects and buckets. To access the service, each developer is required to obtain the Access Key ID and the Secret Key. The Access Key ID, the same ID used for all Amazon Web Services, has to be passed in with every request. The Secret Key is used as an encryption key to encrypt  pieces of the request in order to prove the requestor's identity.
 
The authentication is somewhat involved and there are quite a few questions on forums complaining about authentication problems. Amazon API has a page dedicated to it, which you can see here. In addition, there are code samples in various languages which illustrate the correct usage of authentication. If you do not know much about encryption, look for the code sample in your language - it will save you hours of debugging.
 
In addition to the authentication, S3 API comes with an Access Control List (ACL) for every bucket and every object. The ACL can be set via a separate request or at the time when the bucket or an object are created. Here is the set of currently supported ACL choices.




      Figure 4: Current S3 ACL choices

Using AJAX to access S3

S3 opens an intriguing possibility of dramatically simplifying the back end for some applications. You can envision the architecture where an application simply consists of a client, which directly communicates with S3. This client can be a desktop application or an applet or an Ajax application embedded in the browser. Such a model is not appropriate for enterprise systems that require complex data processing, transactions and caching, but it could work well for things like Google Sync or del.icio.us. Simplicity, instant scalability, a built-in security model and Amazon's reputation make a strong case.
 
If you are writing a Firefox extension or Ajax-based web application, you can either write your own S3 wrapper in JavaScript or use S3Ajax developed by Les Orchad. S3Ajax basically mimics the S3 API, and takes away the pain of dealing with the formatting of the raw requests and encryption.
 
It appears that the developer has plans to build this out further. It would be good to have a higher level of abstraction built in, as well as a way to loop through the objects in the bucket and ability to fetch multiple objects concurrently.
 
This all sounds very sweet, but what about the performance?
 
Needless to say, the performance is one of the top questions on the Amazon S3 forum. Amazon does not provide any specific data and does not have an SLA in place. But the design requirements and the design principles from the S3 creators show strong focus on performance and scalability.
 
There are a  few benchmarks that independent developers posted on forums. I've seem these benchmarks improve since S3 launched. The latest results, indicate low latency. A benchmark on March 17 said: “Putting a 2 MB mp3 took 0.8s, retrieving it took 1.085s -- quite fast, quite responsive”.
 
So who is using Amazon S3 right now?
 
Even though the service launched just a few months ago there is already a number of companies leveraging it for a wide range of purposes.  In personal on-line storage space we note ElephantDrive (Windows), JungleDisk (Windows,Linux,Mac) and Filicio.us (Web-based).  From the discussion forums it is clear that a few companies are planning to use S3 to store media files, but we cannot tell who these companies are. At adaptiveblue we are using S3 to store the bluemarks of our users favorite books, movies and music. And if you want to get a feel for what other things are possible, take a look at this top10 list.
 
Conclusion
 
Amazon S3 is an innovative, exciting new service that is going to change the way we do computing on the web. Michael Arrington of TechCrunch said this in one of his posts:
 
“S3 provides a terrific opportunity for startups with great ideas for a storage user interface to avoid building a back end storage infrastructure. Amazon is offering extremely low pricing and a very dependable infrastructure. For some people, S3 will allow them to launch a service that they otherwise couldn’t have built.”
 
If you have not done so already, take a look at S3 and see for yourself. A good place to start playing with it is the jSh3ll, which offers shell like command line interface to the service. Then join the Amazon Web Services program and start coding. The possibilities are endless!
 
 

More Stories By Alex Iskold

Alex Iskold is the Founder and CEO of adaptiveblue (http://www.adaptiveblue.com), where he is developing browser personalization technology. His previous startup, Information Laboratory, created innovative software analysis and visualization tool called Small Worlds. After Information Laboratory was acquired by IBM, Alex worked as the architect of IBM Rational Software Analysis tools. Before starting adaptiveblue, Alex was the Chief Architect at DataSynapse, where he developed GridServer and FabricServer virtualization platforms. He holds M.S. in Computer Science from New York University, where he taught an award-winning software engineering class for undergraduate students. He can be reached at [email protected]

Comments (5) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Jeff Miller 06/16/06 07:43:35 AM EDT

Terrific eye-opener. And it led me to a Hypertext adventure of further learning about many of the Web 2.0 and 3.0 technologies and vision. Well done. Thank you!

Martin Kochanski 06/14/06 01:13:19 PM EDT

We've integrated S3 support into our Cardbox end-user database (http://www.cardbox.com). We now have an automated backup feature that copies all databases to a backup store on S3, automatically, at user-specified intervals. Because Cardbox is doing the backup itself, it can even back up databases that are currently open and being worked on. More at http://cardbox.wordpress.com/2006/06/13/amazon-s3-and-cardbox/.

The Amazon API is beautifully designed and beautifully documented: a pleasure to work with. But the best thing of all is that it's all automatic: "backup without doing backups"!

Joe Labbe 06/13/06 09:21:20 PM EDT

We've integrated S3 as the document storage for our Ratchet-X product and the results have been superb. The service is blazing fast and reliable. We've toyed in the past with the idea of adding document storage but hesitated because of the operational requirements. Amazon's S3 has removed this impediment thus allowing us to build a business grade storage service for our product with minimal effort. I highly recommend it.

ambit.io.us 06/11/06 04:08:50 AM EDT

Great article. More AJAX in action!

ambit.io.us 06/11/06 04:08:23 AM EDT

Great article. More AJAX in action!

IoT & Smart Cities Stories
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.