Amazon SimpleDB 101 & Why It Matters

59 Comments

Amazon continues to amaze us with its Amazon Web Services series of offerings. The latest is SimpleDB, which will be available in limited beta in a few weeks. And it is bound to have a major impact on web infrastructure. As Amazon says in its email to existing developers:

This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

As we’ve already noted,

…the center of gravity is shifting away from monolithic centralized data management to massively parallel distributed data management.

If you are in the business of managing massive amounts of distributed data, you cannot gloss over the Amazon WS trifecta — data-in-the-cloud is the future and with WS, Amazon is way ahead of the pack.What about the offerings of other vendors? Google, for example, has BigTable, and truth be told, SimpleDB has a distinctly BigTable-ish feel to it. But a side-by-side comparison makes it clear that Amazon WS in general – and SimpleDB in particular — is superior, for the following reasons:

  • Google’s offerings – not only BigTable but GoogleBase, Gdisk, etc. — all have an ad hoc, grab-bag-of-tools feeling to them, devoid of any integrated strategy. Or if there is one, it is well-hidden.
  • Amazon WS clearly involves a well-designed master plan aimed at changing the face of software as a service, each new offering akin to a chess piece in a game focused on creating strategic long-term value. And with SimpleDB, the queen has moved to the center.
  • Amazon WS is based on the YOYODA principle — You Own Your Own Data, Always. Along with Amazon S3, SimpleDB is a sharp arrow in the quiver of open data proponents.
  • Amazon WS includes a built-in, flexible payment system so users are neither forced to offer their app for free nor have an “ad-supported” model forced upon them. Now you can build a data-based web app on SimpleDB and seamlessly charge for it.

Tersely put, SimpleDB is hugely disruptive. It will take some time to evolve the new thinking patterns and new design disciplines that this technology forces us to consider. To do so, consider this breakdown of the similarities and differences between SimpleDB and conventional relational databases.

Very, very simplistically speaking, domains are like tables, with items like rows and attributes like columns. A query cannot cross domains, so in this analogy you can’t “join” domains. But that sort of thinking is a holdover from the relational database normalized model.In reality a domain is much more like a database, so we have to stop thinking in terms of tables and joins.

Say we had an SQL database, with tables for “Company,” “Departments” and “Employees.” In SimpleDB, the items (rows) for all three could all go in one domain (database), with it you can run queries on this domain and using operators like UNION and INTERSECT, you can do the equivalent of joins.Existing web technologies such as Ruby on Rails, Django and Hibernate all have an Object Relational Mapper (ORM), which maps language objects to relational database tables.

If designers of these ORMs want to stay in the scalable apps game, they should take a serious look at using SimpleDB as a data store. Better yet, they should build ORMs from the ground up to integrate with SimpleDB.More than two years ago I wrote that Web 2.0 needs Data 2.0. The combination of EC2, S3 and SimpleDB is a toolkit for assembling massively scalable REST addressable web databases. Data 2.0 is now officially here. May the fun and games begin. [digg=http://digg.com/programming/Amazon_SimpleDB_101_Why_It_Matters]

Nitin Borwankar is a database guru based in San Francisco Bay Area. You can find his writings on hisblog, TagSchema.

59 Comments

Come sarà il prossimo ecommerce ?

[…] di dati: prodotti da una parte ed informazioni dall’altra. Da leggere l’articolo su GigaOm.com “Amazon simple DB101 and why it matters“.I dati non sono più catalogati nel senso classico, ma conservati in modo più semplice e facile […]

JD Gauchat

Hi. I’m working with SimpleDB because I needed a simple database structure. It work pretty well so far, but I had problems understanding the interface, so I designed a script that takes MySQL sentences and send them to the SimpleDB server.
Here you can find the script and instruction:
http://editorialconquer.com/supersimpledb/

JD

Gert Schmeltz Pedersen

PSPS! And you would still need to know sound database design.

Gert Schmeltz Pedersen

Database design, a perspective

Hi SimpleDB users,

I am an oldtimer in databases, who happened to come across the SimpleDB pages, and so out of curiosity started to read about it. It looked good … at first, then I realized some implications of the simplicity, then I looked at some of the threads here and elsewhere dealing with the relationship to relational databases and how to solve more complicated database problems. Then I realized how history repeats itself and how 38 years of accumulated wisdom of database modelling and design have been disregarded or overlooked or misunderstood or neglected.

In the summer of 1975 I read the first edition of Chris Date’s text book on database technology, then I studied E.F. Codd’s papers on the relational model, starting in June 1970, for which he was given the Turing award in 1979. These two guys are behind the tremendous success of relational database technology. Date’s book in many editions, together with other text books on database technology and various forms of The Entity-Relationship Model, originally created by Peter Pin-Shan Chen in 1976, has educated countless computer science students in database design. All computer science students, all developers, all programmers should get familiar with relational database design, it is simple, it is powerful, it can be done in a one semester course, and then you can disregard it with open eyes. It is a scandal, if you were not taught relational database design alongside basic programming skills.

What is it that you do with SimpleDB? You put your application logic into procedural code, where you query each domain and combine the resulting items by coding loops and comparisons and what have you; this is how you implement the equivalent of joins in SQL queries; this is the old procedural versus non-procedural debate, where your procedural version is much, much costlier to maintain, and much, much harder for others to understand, you bury the application logic in tons of hopeless code. If you want to avoid joins, you probably put everything into one domain, or as few domains as possible; if you know a bit of normalization of database design, you know about the anomalies that you have introduced, costly. The claimed advantage of SimpleDB that you may have more than one value in the same field is one aspect of an unnormalized database, therefore harmful. And by the way, it is not true that union and intersect can do joins for you.

Once we had semantic nets and logic databases, they included the attribute-value pairs of SimpleDB, and did not require schemas or predefined fields, and in addition they provided powerful non-procedural queries. But where are they today? Maybe in student projects, maybe in research projects, but not in serious, important applications.

If I were paying your salary, I would never allow you to implement my important applications in SimpleDB or the like. You should instead use MySQL or another free, powerful, yet simple to use, RDBMS. Use it with EC2 and S3, that is fine.

You have other options, though, XML databases in the first place, where you also have non-procedural queries available. As a database developer, you should be able to judge, when and how to use XML databases for a given purpose. If your application needs full-text indexing or integrated storage of all types of files and documents, then take a look at things like the object-based, web service-based Fedora repository system, it has RDF triple storage with a non-procedural query language also.

In conclusion, use SimpleDB only if your application’s database needs amount to one normalized relational table, else invest your time in today’s powerful technologies.

PS! What if SimpleDB implemented joins, that is, “field1” = “field2”? That would be a great step forward. Then you would need indexing behind the scene, so that performance of joins could be as good as in RDBMS.

dave

That was a nice comment exactly expressing my thoughts and feelings. It feels like relational databases 30 years ago and the whole thing like reinventing the wheel.

Peter

It’s reinventing the wheel (or better, going back to pulling sledges) because it turns out there’s a scale where the wheel stops scaling. And the pain of scaling is then harder than the pain of not having the relational stuff. You give up one for the other.

Most sites will never hit that painpoint in scaling, but also, plenty of sites do. And many sites don’t need the relational stuff. So it’s a trade-off.

Web2NewYork (beta) | Blog

[…] is unsuitable for web 2.0 applications that have to scale. Amazon’s SimpleDB and Google’s BigTable have kicked off a paradigm shift away from the relational […]

likejazz.COM · 구글 웹 어플리케이션 플랫폼: App Engine

[…] App Engine을 통해 사실로 드러났다. 데이타스토어를 제공하는 서비스는 아마존에서 제공하는 SimpleDB도 있지만 App Engine은 이를 하나의 팩키지로 묶었다는 점에서 보다 큰 […]

ivbeg: О распределённых поисковых машинах, Enabot и HyperTable

[…] свою распределённую базу. Мне вспомнилась одна из обзорных статей по Simple DB – несмотря на сильно упрощённые интерфейсы, это одна […]

Иван Бегтин | О распределённых поисковых машинах, Enabot и HyperTable

[…] свою распределённую базу. Мне вспомнилась одна из обзорных статей по Simple DB – несмотря на сильно упрощённые интерфейсы, это одна […]

Nati Shalom

Nitin

First of all thanks for the great writeup.
Looking at this thread i think that the announcement around SimpleDB provoked an interesting discussion related to the role of data bases in web 2.0 archictecture. While i think that this discussion is very interesting and very relevant i also think that it is important to emphasize that SimpleDB is not yet another data base and shouldn’t be measured as such.

I wrote a summary (Amazon SimpleDB is not a database!) that aims to clarify my point on this matter – I’ll appreciate your comments on this regard.

Nitin Borwankar

Hi folks,

While we wait for Amazon SimpleDB to be publicly available, here’s somthing to think about re: disruptive technologies and how they nibble their way to the center of the market.

http://tinyurl.com/24xoey

Comments are closed.