Welcome!

Agile Computing Authors: Elizabeth White, Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski

Related Topics: Agile Computing

Agile Computing: Article

i-Technology Viewpoint: How Amazon S3 is Going to Change the World

'Goodbye' scalability problems, 'Hello' unlimited storage

Web 2.0 Journal Contributing Editor Alex Iskold (pictured) writes:  We are observing the transformation of the web from an ecosystem into an operating system. Building blocks such as websites, blogs, web services, podcasts and RSS are coming together and give rise to a new computing platform. The web operating system is emerging and it is bigger than the sum of its parts.
 
Remember your operating systems class, when you learned that every operating system has a handful of fundamental concepts such as storage, virtual memory and scheduling? The new web is no exception. However, since the Internet is a gigantic network of computers working in parallel, the basic operating system concepts take on a different shape.
 
For example, when you try to save a file on your computer, there is a (rare) possibility that the disk is full, and the write will fail. But with the new web operating system, this does not have to be the case. If the disk of one computer becomes full, the web operating system can switch and store the file on another computer. So on the web, you practically never run out of space.
 
Amazon S3 – the new virtual storage service from Amazon
 
The virtual storage has been in the news for sometime now. Dion Hinchcliffe has written a survey of virtual storage providers in his recent post. He particularly commended the Amazon S3 – simple storage service for its innovative API.

In essence, the Amazon S3 offers developers a huge hashtable. The minimalistic API, available in both SOAP and REST, is focused on basic management of the objects – write, read and delete. By default, the service works over HTTP and supports storage of objects up to 5 gigabytes in size. There is also support for BitTorrent and a plan to add other protocols in the future. To use the service, you have to have an Amazon Web Services account.
 
Amazon has done a  very thorough job documenting and supporting the service. The resources page contains a wealth of useful information to get you going, most notably the API and the user forums. There are also code samples available in various languages that illustrate how to use the Amazon S3 API.
 
Storing and retrieving objects
 
The objects in S3 account are placed into buckets. Each account is allowed to have up to 100 buckets and the bucket name has to be unique across all S3 users. 


 


             Figure 1: Example from Amazon S3 API shows CREATE BUCKET request


Each object inside a bucket has to have a unique UTF-8 compliant key assigned by the developer. Since there is no specific key structure imposed by S3, the developers are free to do what best suits their needs. The documentation hints at using slashes to create directory-like structure, but does not insist on it. The lack of key specificity and directory interface in API is not a limitation, but an added flexibility, since people's needs might be different and implementing the directory-like storage is just a matter of following naming conventions in the code.


      Figure 2:  Example from Amazon S3 API shows GET BUCKET request
 
The API also allows the developer to list all the keys in a particular bucket. This is implemented using a flavor of the Iterator pattern and a concept of a marker. With the first query no marker is supplied. If the bucket contains more objects than specified in max-keys parameter, then the a marker for the next starting point is returned. To obtain the next set of results, the marker is passed back in the subsequent request.
 


Figure 3: Amazon S3 diagram

 
Since it might be expensive to fetch the entire object from S3, the API allows you associate the meta data with each object. The meta is returned together with the key when GET BUCKET operation is requested. This functionality is particularly handy for storing and looking up information about large media files.
 
S3 Security
 
S3 has a built in security model for both connecting to the service and setting the access policy for  individual objects and buckets. To access the service, each developer is required to obtain the Access Key ID and the Secret Key. The Access Key ID, the same ID used for all Amazon Web Services, has to be passed in with every request. The Secret Key is used as an encryption key to encrypt  pieces of the request in order to prove the requestor's identity.
 
The authentication is somewhat involved and there are quite a few questions on forums complaining about authentication problems. Amazon API has a page dedicated to it, which you can see here. In addition, there are code samples in various languages which illustrate the correct usage of authentication. If you do not know much about encryption, look for the code sample in your language - it will save you hours of debugging.
 
In addition to the authentication, S3 API comes with an Access Control List (ACL) for every bucket and every object. The ACL can be set via a separate request or at the time when the bucket or an object are created. Here is the set of currently supported ACL choices.




      Figure 4: Current S3 ACL choices

Using AJAX to access S3

S3 opens an intriguing possibility of dramatically simplifying the back end for some applications. You can envision the architecture where an application simply consists of a client, which directly communicates with S3. This client can be a desktop application or an applet or an Ajax application embedded in the browser. Such a model is not appropriate for enterprise systems that require complex data processing, transactions and caching, but it could work well for things like Google Sync or del.icio.us. Simplicity, instant scalability, a built-in security model and Amazon's reputation make a strong case.
 
If you are writing a Firefox extension or Ajax-based web application, you can either write your own S3 wrapper in JavaScript or use S3Ajax developed by Les Orchad. S3Ajax basically mimics the S3 API, and takes away the pain of dealing with the formatting of the raw requests and encryption.
 
It appears that the developer has plans to build this out further. It would be good to have a higher level of abstraction built in, as well as a way to loop through the objects in the bucket and ability to fetch multiple objects concurrently.
 
This all sounds very sweet, but what about the performance?
 
Needless to say, the performance is one of the top questions on the Amazon S3 forum. Amazon does not provide any specific data and does not have an SLA in place. But the design requirements and the design principles from the S3 creators show strong focus on performance and scalability.
 
There are a  few benchmarks that independent developers posted on forums. I've seem these benchmarks improve since S3 launched. The latest results, indicate low latency. A benchmark on March 17 said: “Putting a 2 MB mp3 took 0.8s, retrieving it took 1.085s -- quite fast, quite responsive”.
 
So who is using Amazon S3 right now?
 
Even though the service launched just a few months ago there is already a number of companies leveraging it for a wide range of purposes.  In personal on-line storage space we note ElephantDrive (Windows), JungleDisk (Windows,Linux,Mac) and Filicio.us (Web-based).  From the discussion forums it is clear that a few companies are planning to use S3 to store media files, but we cannot tell who these companies are. At adaptiveblue we are using S3 to store the bluemarks of our users favorite books, movies and music. And if you want to get a feel for what other things are possible, take a look at this top10 list.
 
Conclusion
 
Amazon S3 is an innovative, exciting new service that is going to change the way we do computing on the web. Michael Arrington of TechCrunch said this in one of his posts:
 
“S3 provides a terrific opportunity for startups with great ideas for a storage user interface to avoid building a back end storage infrastructure. Amazon is offering extremely low pricing and a very dependable infrastructure. For some people, S3 will allow them to launch a service that they otherwise couldn’t have built.”
 
If you have not done so already, take a look at S3 and see for yourself. A good place to start playing with it is the jSh3ll, which offers shell like command line interface to the service. Then join the Amazon Web Services program and start coding. The possibilities are endless!
 
 

More Stories By Alex Iskold

Alex Iskold is the Founder and CEO of adaptiveblue (http://www.adaptiveblue.com), where he is developing browser personalization technology. His previous startup, Information Laboratory, created innovative software analysis and visualization tool called Small Worlds. After Information Laboratory was acquired by IBM, Alex worked as the architect of IBM Rational Software Analysis tools. Before starting adaptiveblue, Alex was the Chief Architect at DataSynapse, where he developed GridServer and FabricServer virtualization platforms. He holds M.S. in Computer Science from New York University, where he taught an award-winning software engineering class for undergraduate students. He can be reached at [email protected]

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...