30 Comments

Summary:

[qi:020][qi:026] As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even Google is going offline, for remote access and online storage aren’t enough. There is a need for sync with local duplicates, one the likes of SugarSync, Dropbox, […]

[qi:020][qi:026] As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even Google is going offline, for remote access and online storage aren’t enough. There is a need for sync with local duplicates, one the likes of SugarSync, Dropbox, Apple’s MobileMe and Microsoft’s LiveMesh are aiming to fulfill.

I worked at SugarSync for almost three years (I no longer have any financial ties to the company), so I know the double-edged sword of sync firsthand. When you sell a sync product, you sell magic. (We free the data from their physical devices; just forget where you last edited the file, it’s gonna be everywhere.) But once it’s implemented, there’s no magic anymore, and the engineer is left to deal with asynchrony, slow bandwidth, third-party applications, and file systems that have different semantics.

Here’s why:

Push sync is deeply asynchronous

Conceptually, you have redundant copies of your data on various devices, and a service that keeps the copies in sync. That means that when a change is made on one device, you replay the change to the others. So when a file is edited on one device, it needs to be updated on the others. In the meantime, you can’t guarantee that the old version of the file remains untouched on the other devices. It could be edited, moved or deleted, which is when conflicts arise. It’s well known that concurrent programming is difficult; sync is just an extreme example.

Sync matches different data models

You can’t actually sync identical duplicates of your data because they live on different devices. So you have to translate the data to the local models. File names alone don’t sync properly from a Mac to Windows without careful unicode transformations, so imagine what becomes of extended attributes, resource forks, ACLs, etc. Even if you’re not cross-platform, most file systems can be set to either case-sensitive or case-insensitive. So you have to come up with an extensible strategy to deal with the different models, and there is infinite testing involved.

Sync messes with third-party applications

Applications tend to misbehave in various ways with their documents. When a sync product attempts to update a file with a newer version from another device, it can’t always know whether or not the file is open for reading or editing. In that case, the application may become unstable. Syncing app data is also dangerous.

Sync is hard to test

Sync maintains redundant copies of the user’s data through incremental updates. The devices are in sync as long as the redundant copies are consistent. Developers and testers will usually assume in their testing that the initial state is in sync; they will make a change on one side and see that the other side changes accordingly. So as soon as an error is introduced in the system (you’re out of sync to start with), you’re in a non-tested scenario. The system needs to recover from its own errors.

Several of these problems are on the client side, which is why building a sync client is hard — even on the best sync platforms out there, such as Sync Services or Live Framework.

Some of these difficulties are just inherent to sync, but the technology is maturing. Sync so far has meant something different to everybody. Even the industry’s main players still sell byproducts of sync rather than sync itself, which then compete with backup, online storage, photo-sharing or music-streaming services. Sync is a great enabler for all these connected services, one that’s becoming central to the personal cloud story of more and more companies.

Jean-Gabriel Morard is a software engineer living in Paris.

  1. Timely article as I was just bemoaning the slow progress of Live Mesh.

  2. Richard Farleigh Sunday, May 10, 2009

    You can solve this all quite easily.

    Don’t have multiple places to store your data, just have one cloud/server type storage that all your devices can access.

    Eliminate local storage and store everything in the cloud.

    1. To go along with the trust and privacy issues associated with all your data “in the cloud”, there is also the real problem that bandwidth, whether it be cable/DSL, 3G, 4G, etc, is nowhere near ready to replicate local load times, particularly as it related to working with images and video.

    2. Yes, This works! I am developing a software having a centralized server storing all the data. This data can be accessed from mobile or web or desktop.

  3. Dan Cornish Sunday, May 10, 2009

    We solved this problem by doing away with sync altogether. Version control is a far better way to go. We have developed a way to version contacts and tasks, therefore eliminating the need for sync between Outlook and the cloud. We have also integrated this into our new iPhone application which is being sent to Apple this week. Please check us out at http://www.cosential.com

    1. Jean-Gabriel Morard Dan Cornish Sunday, May 10, 2009

      @Dan Cornish Version control is actually a very interesting approach. It’s a mature technology, it’s very safe with a very granular control left to users. But it has its limits as well: it pushes the complexity of dealing with conflicts onto the users. While version control is perfectly adapted to advanced users, it isn’t a great fit for people who are mostly looking for a seamless productivity tool.

  4. i am using rsync for last 4 years no issue

  5. Dagbok för 10 May 2009 | En sur karamell Sunday, May 10, 2009

    [...] Why Sync Is So Difficult — 18:30 via Google [...]

  6. As far as I can tell, Dropbox has actually solved the desktop file-sync problem. It’s the first sync product of its kind that I’ve used that works out-of-box.

    Yes, sync is hard. Where I think things fail is mainly around user experience. You just have to make smart decisions and let users recover if they see something they don’t expect. Dropbox does this quite well.

    I learned this when I was at Microsoft on the ActiveSync team — we actually were trounced by RIM not because our sync was worse (it was probably superior) but because we initially made the mistake of OVER-reporting status.

    Sync should be a silent, no/very little UI experience — a utility that just works in the background. Any attempt to make it more than that will cause the product to fail miserably.

    1. Jean-Gabriel Morard Garry Tan Sunday, May 10, 2009

      @Garry Tan I totally agree that users should not need to know that we’re solving these problems for them. And ideally no UI is what you’d like but when people rely on the sync product to push a file onto a device before they take the road, they also want to know that it made it there before they can turn off their computer. So some feedback is needed. File sync is also expensive in bandwidth and in CPU and you need to account for that to the user. It doesn’t have to jump at the screen but it should be here for reassurance to users who do want to know.

      Again- syncing files just makes it more likely to run into any of the problems that you may have when you sync PIM data. Indeed you can assume that PIM-data sync is almost always carried through and through; but file sync very often is interrupted (because transferring files across the network is slow!) and people will then wonder where there data is. Communication is important in that case.

  7. don’t forget about Livedrive, they seem to have syncing down to a fine art. Works perfectly

  8. Daniel Larsson Sunday, May 10, 2009

    Hello,
    Great article I just wanted to add that SpiderOak Inc (https://spideroak.com) offer a FREE (2GB) Online Backup and Smart Sync solution for both companies and consumers.

    SpiderOak is available for both Linux, Mac and Windows and incorporates 100% zero-knowledge online backup,sync, storage, access and share.

    Try out SpiderOak today at https://spideroak.com

    Best,
    Daniel Larsson
    SpiderOak Inc

  9. Victor Panlilio Sunday, May 10, 2009

    @Richard Farleigh wrote: “Eliminate local storage and store everything in the cloud.”

    And when the backhoe accidentally cuts multiple fiber cables, suddenly you have no access to your data. FAIL.

  10. Jean-Gabriel Morard Sunday, May 10, 2009

    @Richard Farleigh You’re right that a remote-access based system works around these problems.

    Online storage is great if you live in a fully connected world with unlimited bandwidth, or with a small number of small files, so that you can always find a network access and download them on demand. But as soon as you try to actually solve the problem of the multiplication of devices, where people really want r/w access to all their files all the time, it’s too limited. Also with remote access-based solutions, users have to think ahead and upload the files they then are going to need on the road.

    One could come up with a hybrid system, such as remote access with a cache. If you cache some of your files locally then you solve some of the network issues (flaky, slow or no network.) But if you give people write access to their files in their cache then you introduce the problem of merging the local and the remote edits when the user goes back online, and you’re back in sync land.

  11. Sixth SenseS » Why Sync Is So Difficult Monday, May 11, 2009

    [...] Read more:  Why Sync Is So Difficult [...]

  12. Dropbox rocks! it just works

  13. Sync is Zen Magic Monday, May 11, 2009

    [...] Morard – formerly of SugarSync – has a great post up on GigaOm titled Why Sync Is So Difficult.  It was one of the things I read before my run yesterday and it was in my head the entire [...]

  14. With some effort I was able to parse this opening sentence, but even so an actual editor should probably have fixed it:

    “As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even Google is going offline, for remote access and online storage aren’t enough.”

    Unfortunately, GigaOm doesn’t have editors.

    1. JG, he’s frenchie! Could you write a sentence of equivalent complexity in French? If not, take a chill pill.

  15. Collaboration and SaaS Monday, May 11, 2009

    the problem of sync might be eliminated if people look at their devices only as portals to their data which resides on the cloud.

  16. What about syncML product, like Funambol. With common sync platform, can we avoid problems with sync app data? Any thoughts?

  17. What about syncML product, like Funambol. With common sync platform, can we avoid problems with sync app data? Any thoughts?

    1. Jean-Gabriel Morard Kevin Monday, May 11, 2009

      SyncML is not a work around. It’s a great idea to use an open standard for the communication layer, because it gives you interoperability with third-party sync clients or connectors. But you still have to solve all the problems in the article.

      Note the Apple uses SyncML as well: iSync “uses a plug-in architecture based on the SyncML open standard to support virtually all modern handsets” http://www.apple.com/macosx/features/isync/

      But most SyncML clients only sync PIM data (not files) so they don’t have to deal with filesystem weirdness. And since they sync structured data that makes sense to them, they can actually resolve conflicts (though that is non trivial either) – file sync apps cannot generally decide how a conflict on binary files should be resolved.

  18. Dan Cornish Monday, May 11, 2009

    We have blended version control with dupe checking. The big problem with sync is conflicts. The user gets to decide who wins. In our world, the enterprise software space, the admin should always win, or at least be able to win. This is why version control works because the admin can roll back to any point in time.

  19. Maybe you should check out Synkia.com for syncing with mobile devices; based on SyncML. Synkia currently syncs to the cloud mobile content such as contacts, calendar, notes, tasks and SMS. File upload and backup soon to be launched.

    Synkia AS, Norway

    ‘We care, so you don’t need to’

  20. Sync Difficulties – Oh the Irony Tuesday, May 12, 2009

    [...] and services is a great big pain in the ass … So I was naturally curious in the latest at GigaOm on the topic though note the complete fail (now on my second machine) from xmarks that decided to [...]

  21. Top Posts « WordPress.com Tuesday, May 12, 2009

    [...] Why Sync Is So Difficult [qi:020][qi:026] As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even [...] [...]

  22. Maxim Sokhatsky Wednesday, May 13, 2009

    Taking the opportunity I want to add that Synrc Research Center provides ONE BUTTON Buddhist-style free Desktop Application that is handy and easy-to-use tool for organizing, importing/exporting, syncing your Address Books with Google Contacts, Microsoft Outlook, NOKIA Phones, Windows Contacts local folder, LDAP Directory, Yahoo! Contacts and Windows Live Contacts.

    http://synrc.com/contact-manager.htm

    Maxim Sokhatsky
    Synrc Research Center

  23. Sync is a Holy Grail | Tom Keller Wednesday, May 27, 2009

    [...] Sync is one of personal computing’s holy grails.  Here’s an interesting article on why it’s difficult. [...]

  24. Online Backup/File Sync Services | Zalmoxis Blog Friday, August 14, 2009

    [...] Why Sync Is So Difficult (gigaom.com) AKPC_IDS += "355,";Popularity: unranked [?] [...]

Comments have been disabled for this post