From POSIX to AWS and S3

From Portable Operating System for Unix Filesystems (POSIX), introduced first in 1988 to Amazon Web Services Simple Storage Service (S3), introduced in 2006, data persistence is experiencing a revolution. In 2013, AWS published the fact that their S3 platform housed more than 2 trillion objects and that the number was doubling every year. That would bring them to something like 16 trillion objects stored, and if we assume that the average size of objects is only 256KB in size, the platform would currently house somewhere well into the exabyte capacity range. That puts the platform somewhere around 1% of all the world’s digital data, probably more.

This is surely the single largest data storage platform on Earth.

The success of AWS and their storage platform can no longer be ignored, but does S3 make sense for the future? It's interesting to reflect back on why AWS introduced such a protocol to the market and why it does or doesn't make sense.

We cannot know everything Amazon was thinking when the protocol was introduced, but traditional filesystem constraints are well known. AWS was designed to provide on-demand IT infrastructure in what today is called IaaS. Being able to start and stop services on demand was wonderful, but information created needed to be made persistent. The data persistence system needed special properties, among them:

POSIX filesystems have been the mainstay of storage since their introduction, and the hierarchical organization model of filesystems certainly is a model well-understood by humans. They nevertheless have constraints that make their scaling difficult, if not impossible, as their size grows. Many storage administrators have experienced watching inode counts (the unique identifiers for files and directories) like milk warming on the stove, knowing that beyond certain limits, the system becomes unstable.

The users of POSIX filesystems are also a demanding lot. Full consistency is expected across the entire data tree. Any modification to any file or the directory structure must be communicated to all users of the filesystem simultaneously. Application error-handling mechanisms are not even expected to entertain the possibility that this is not true. The POSIX filesystem must publicly admit any deviation from the rules by returning a dreaded IO error.

It's not hard to see that AWS needed to break the rules in one way or another in order to scale to the dizzying heights they have obtained today, and what better way to break the rules than to simply change them? By introducing an altogether new protocol, AWS was able to establish a new set of rules and conventions that allowed them to scale and provide services that made more sense to their customers. Here's a high-level view:

In the next installment, we will discuss how applications use AWS S3 storage and the many improvements to the model over the 10 years of its existence. 

Read the Press Release for RING 6.0

Learn What’s New in RING 6.0

Follow Brad King on Twitter

 

 

 

 

Top