Blog Post

Who Protects Your Cloud Data?

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Back in April, we speculated about one of the hidden dangers of depending on web services to store your data: the possibility that no one was doing backups. Now that possibility may have turned to reality for users of Omnidrive (once touted as the “clear leader” in the online storage field by TechCrunch). The service has been offline for some days, with its servers currently not responding at all. A December article at ReadWriteWeb contains serious allegations of fraud from the company’s ex-CTO (as well as a defense from the CEO).

My sympathies at this point are with Omnidrive’s users, particularly those who have their only copies of documents on an unreachable server. I can think of plenty of times when a days-long outage (let alone a permanent loss) of my own document storage would be devastating. The larger question, though, is what you as a user can (or should) do about this? Online document storage is certainly attractive to the web worker; being able to access and share your work easily in any browser is definitely a killer feature. But how do you balance that off against the fact that your documents could simply vanish overnight?

One possible approach is simply to choose your storage vendor very carefully. Backup vendor Mozy, for example, is owned by giant EMC, Jungle Disk uses your Amazon S3 account for storage (so your data will be available even if Jungle Disk itself goes under), and Google Documents is, well, Google. Some smaller vendors have their own serious backup policies to guard against hardware failures.

Yet in a world of imperfect hardware and software, as well as regulatory and legal issues, choosing one company for storage is still ultimately a gamble. It may be unthinkable that an EMC or Amazon or Google could fail, but it’s not impossible. No matter how carefully you choose, entrusting your data to a single online storage vendor is the equivalent to storing it on a single hard drive: it introduces a single point of failure into the system.

For hard drives, of course, we’ve long had several answers to this problem: backups or RAID. If disks are unreliable, make a copy of the data elsewhere. If one disk is unreliable, store your data on three or five or seven disks, with a scheme that allow perfect data recovery even if one or two disks should suddenly be reduced to iron filings by hardware failures. What the disappearance of Omnidrive suggests to me is that it’s time for the next step in the evolution of online file storage, now that there is more than enough competition in the market for simple storage. We need the online equivalent of backups and RAID.

This doesn’t mean that the online storage services need to use backups and RAID on their servers; that’s irrelevant to me as a consumer in providing protection against vendor failure. Rather, I’d like to see products that automatically back up, say, a Box.Net account to Amazon S3 storage. Or an API that writes copies of my data simultaneously to Amazon and the fabled GDrive, and allows retrieval from either service if the other is missing. Or even a way to mirror my online storage, overnight, down to a desktop drive for safekeeping.

Until products like these are available (and if I’ve just missed them, please let me know in the comments), storing your documents online will remain a gamble. Perhaps a safe gamble, but it could be made far safer with more vendor independence.

21 Responses to “Who Protects Your Cloud Data?”

  1. Mike Stankavich wrote:

    “Maybe I’m paranoid or a pre-web Luddite, but I can’t stand the thought of not having an offline backup of any data that has personal or business significance to me. Vendors going out of business and extended connectivity outages are risks that I am not willing to accept.

    External drives are so cheap now that there’s really no excuse left to not have everything backed up at least two or three times locally. My local Costco has 250gb Western Digital passport drives for $139.99.”

    Something like Amazon S3 with multiple backup sites is *far* less risky than your system of backing up to external HDs.

    Don’t get me wrong, I also back up to external HDs and then swap between home and office. I also regularly back up my data files to optical media.

    But Amazon is likely much more reliable than both of those, especially in the case of natural disasters such as Hurricane Katrina where having multiple backups in different locations isn’t all that helpful if the entire area turns into a disaster zone.

  2. Perhaps this is more for medium/large business but they really need to take responsibility for their own data. There are plenty of web based file systems our there (I’ll not mention my own…just click the link :) and really thats what businesses should be looking at if they have no confidence in outsourcing their data storage.

  3. angry omnidrive user

    So that someone can pull the reason of some “high TTL” excuse.

    Anyway, did they change the IP address? It’s the same 75.126.5.64 before and after.

  4. No Responses to Website Server Back Up
    Your User Says: Your comment is awaiting moderation.

    January 14th, 2008 at 8:34 pm
    No RAID? So unreliable!

    In response to:
    http://www.omnidrive.com/blog/2008/01/13/website-server-back-up/

    Posted on January 13th, 2008 by nik
    The main server that hosts this website (which is seperate from the application and storage servers that run Omnidrive) had a hard drive fail, went offline and was down for a period. We had a new server up within hours and we restored backups of the site, and it is up-and-running again. I am really sorry for the inconvenience to our users of not being able to access the main site, although the actual Omnidrive servers were not affected at all and you would not have had any problems accessing your accounts. We are creating a second instance of the website now so that we have a failover. The website has been up almost every minute since we launched, and is rather basic (WordPress) . Update: The server is at a new IP address and the old DNS record had a high TTL, so it is taking some time to propagate. It should be all done by now, but it did result in some clients not being able to access the server for a while

  5. One thing that is a bit annoying for me is that both bingodisk and strongspace are down at the moment. Apparently Joyent are currently having issues with zfs. Basically means users cannot access their backups or static content at the moment.

  6. Great article… I was one of those angry mis-treated omnidrive users for a long time, but I was lucky and copied/removed all my data from Omnidrive and was able to get a refund after constant persistence about 2 days before they went down!

    Now I have moved to using JungleDisk with Amazon, and so far I am very pleased. I have much more confidence in Amazon, but I plan on using a secondary source for backup as well just in case (probably Mozy). Not to mention daily back-ups to a NAS just in case. I also keep a box.net account for my Word/Excel documents. I have been with Box.net for over 2 years and I have had zero problems, but I don’t use them for my bigger backups.

    I learned the hard way, others should not have to.

  7. Mike Stankavich

    Maybe I’m paranoid or a pre-web Luddite, but I can’t stand the thought of not having an offline backup of any data that has personal or business significance to me. Vendors going out of business and extended connectivity outages are risks that I am not willing to accept.

    External drives are so cheap now that there’s really no excuse left to not have everything backed up at least two or three times locally. My local Costco has 250gb Western Digital passport drives for $139.99.

    At some point in the future I may consider using cloud storage as an additional form of offsite redundancy. But for now, I just rotate two external drives between home and a locked file cabinet at my office.

  8. Amazon doesn’t disclose a lot of internals about S3 for security and competitive reasons, but they have stated before that all data is stored in at least 3 different datacenters in at least 2 geographic areas (e.g. east/west/central). They are pretty serious about data security as well as availability.

  9. Using online storage providers for primary storage is asking for trouble. Even if it is backed by a large company, priorities change and that large company may decided to close down the service. I use online storage providers to store encrypted off-site backups and nothing else.

    This issue applies more broadly to Web 2.0 applications in general. If my business relies on a web-based service, what happens if that service goes out of business? Always create a contingency plan and always create your own (local!) backups of any data stored on the web.

  10. thorgersen

    I use Mozy as well as daily (or more often) backups to a removable disk drive, stored onsite. Doubtful (though not impossible !!!) that the computer,onsite backups, and offsite backups are all inaccessible at the same time.

  11. There’s a difference between a backup storage provider and one where you’re creating the master copies of your documents in the cloud. If the online storage service is merely being your backup provider you’ve lost nothing if they go away – you just have to find a new online backup service. A hassle, maybe, but the risk is minimal for a week or so if you don’t backup and there are several other options out there.

    If you’re creating master copies of your documents in the cloud… well you STILL need to have a backup strategy. Not so much because your files might be stored on one drive and lost, but because they might become inaccessible. It’s the exact same issue as backing up local data – your data is all in one place, what if that place suddenly is not accessible?

  12. You don’t need to look at a relatively small vendor like OmniDrive. Remember the repeated outages Salesforce.com had a few months back? This is mission-critical information for a lot of people.
    That said I would still argue that keeping your data on the cloud is many times safer than on your local hard disk.
    Perhaps we need services such as Pingdom that measure and rank the various cloud storage providers in terms of reliability and up-time.

  13. “Cloud RAID” is a good idea and I would certainly look with interest at someone offering such a solution.

    One thing to be very careful about, however, is which vendor is supplying the physical backend storage service for the multiple RAID providers.

    Why? The company you are paying for your web hosting is probably outsourcing the actual hardware infrastructure to another (larger, more efficient) back-end supplier. This will increase in future as economies of scale lead to fewer, but larger, utility computing centers.

    In the “Cloud RAID” model there must be careful control that you are not contracting with what appear to be a number of independent RAID storage providers, but when we look at the physical implementations it turns out that one or more of these providers is actually being hosted by the same back-end storage utility.

    We would then be running same risk as before! One hardware failure (or network outage) could lead to data loss. What is worse, though, is that by using the RAID approach we have a false sense of security about our data.

    I have been using Jungle Disk for offsite backups since November 2006 without any problems. I am confident in the reliability of Amazon S3 for this. Even so, it would be nice to have at least some idea of:

    (1) How many separate copies Amazon is storing of my data

    (2) Where my data is (roughly) located.

    At the moment my offsite backups are being stored somewhere in the “Amazon cloud”. Where, exactly, I have no idea …

  14. This is the Achilles’ Heel of online data storage. Once it’s off my computer (or my network) I no longer have control over my data.

    Irate ex-worker decides to take a few servers down on the way out the door? Local internet provider suffers a service outage? Even planned maintenance downtime can interfere with our ability to retrieve our files and get our work done. And if we depend on web-based apps we can lose the very tools needed to work with our documents.

    As convenient as online data storage and web apps can be, they introduce too many (uncontrollable) points of failure to be relied upon as a primary solution.