Blog Post

Why Sync Is So Difficult

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

[qi:020][qi:026] As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even Google (s goog) is going offline, for remote access and online storage aren’t enough. There is a need for sync with local duplicates, one the likes of SugarSync, Dropbox, Apple’s (s aapl) MobileMe and Microsoft’s (s msft) LiveMesh are aiming to fulfill.

I worked at SugarSync for almost three years (I no longer have any financial ties to the company), so I know the double-edged sword of sync firsthand. When you sell a sync product, you sell magic. (We free the data from their physical devices; just forget where you last edited the file, it’s gonna be everywhere.) But once it’s implemented, there’s no magic anymore, and the engineer is left to deal with asynchrony, slow bandwidth, third-party applications, and file systems that have different semantics.

Here’s why:

Push sync is deeply asynchronous

Conceptually, you have redundant copies of your data on various devices, and a service that keeps the copies in sync. That means that when a change is made on one device, you replay the change to the others. So when a file is edited on one device, it needs to be updated on the others. In the meantime, you can’t guarantee that the old version of the file remains untouched on the other devices. It could be edited, moved or deleted, which is when conflicts arise. It’s well known that concurrent programming is difficult; sync is just an extreme example.

Sync matches different data models

You can’t actually sync identical duplicates of your data because they live on different devices. So you have to translate the data to the local models. File names alone don’t sync properly from a Mac to Windows without careful unicode transformations, so imagine what becomes of extended attributes, resource forks, ACLs, etc. Even if you’re not cross-platform, most file systems can be set to either case-sensitive or case-insensitive. So you have to come up with an extensible strategy to deal with the different models, and there is infinite testing involved.

Sync messes with third-party applications

Applications tend to misbehave in various ways with their documents. When a sync product attempts to update a file with a newer version from another device, it can’t always know whether or not the file is open for reading or editing. In that case, the application may become unstable. Syncing app data is also dangerous.

Sync is hard to test

Sync maintains redundant copies of the user’s data through incremental updates. The devices are in sync as long as the redundant copies are consistent. Developers and testers will usually assume in their testing that the initial state is in sync; they will make a change on one side and see that the other side changes accordingly. So as soon as an error is introduced in the system (you’re out of sync to start with), you’re in a non-tested scenario. The system needs to recover from its own errors.

Several of these problems are on the client side, which is why building a sync client is hard — even on the best sync platforms out there, such as Sync Services or Live Framework.

Some of these difficulties are just inherent to sync, but the technology is maturing. Sync so far has meant something different to everybody. Even the industry’s main players still sell byproducts of sync rather than sync itself, which then compete with backup, online storage, photo-sharing or music-streaming services. Sync is a great enabler for all these connected services, one that’s becoming central to the personal cloud story of more and more companies.

Jean-Gabriel Morard is a software engineer living in Paris.

30 Responses to “Why Sync Is So Difficult”

  1. Taking the opportunity I want to add that Synrc Research Center provides ONE BUTTON Buddhist-style free Desktop Application that is handy and easy-to-use tool for organizing, importing/exporting, syncing your Address Books with Google Contacts, Microsoft Outlook, NOKIA Phones, Windows Contacts local folder, LDAP Directory, Yahoo! Contacts and Windows Live Contacts.

    Maxim Sokhatsky
    Synrc Research Center

  2. Maybe you should check out for syncing with mobile devices; based on SyncML. Synkia currently syncs to the cloud mobile content such as contacts, calendar, notes, tasks and SMS. File upload and backup soon to be launched.

    Synkia AS, Norway

    ‘We care, so you don’t need to’

  3. We have blended version control with dupe checking. The big problem with sync is conflicts. The user gets to decide who wins. In our world, the enterprise software space, the admin should always win, or at least be able to win. This is why version control works because the admin can roll back to any point in time.

    • SyncML is not a work around. It’s a great idea to use an open standard for the communication layer, because it gives you interoperability with third-party sync clients or connectors. But you still have to solve all the problems in the article.

      Note the Apple uses SyncML as well: iSync “uses a plug-in architecture based on the SyncML open standard to support virtually all modern handsets”

      But most SyncML clients only sync PIM data (not files) so they don’t have to deal with filesystem weirdness. And since they sync structured data that makes sense to them, they can actually resolve conflicts (though that is non trivial either) – file sync apps cannot generally decide how a conflict on binary files should be resolved.

  4. Ken B

    With some effort I was able to parse this opening sentence, but even so an actual editor should probably have fixed it:

    “As we increasingly struggle to manage our data spread, both on our devices and in the cloud, even Google is going offline, for remote access and online storage aren’t enough.”

    Unfortunately, GigaOm doesn’t have editors.

  5. @Richard Farleigh You’re right that a remote-access based system works around these problems.

    Online storage is great if you live in a fully connected world with unlimited bandwidth, or with a small number of small files, so that you can always find a network access and download them on demand. But as soon as you try to actually solve the problem of the multiplication of devices, where people really want r/w access to all their files all the time, it’s too limited. Also with remote access-based solutions, users have to think ahead and upload the files they then are going to need on the road.

    One could come up with a hybrid system, such as remote access with a cache. If you cache some of your files locally then you solve some of the network issues (flaky, slow or no network.) But if you give people write access to their files in their cache then you introduce the problem of merging the local and the remote edits when the user goes back online, and you’re back in sync land.

  6. @Richard Farleigh wrote: “Eliminate local storage and store everything in the cloud.”

    And when the backhoe accidentally cuts multiple fiber cables, suddenly you have no access to your data. FAIL.

  7. As far as I can tell, Dropbox has actually solved the desktop file-sync problem. It’s the first sync product of its kind that I’ve used that works out-of-box.

    Yes, sync is hard. Where I think things fail is mainly around user experience. You just have to make smart decisions and let users recover if they see something they don’t expect. Dropbox does this quite well.

    I learned this when I was at Microsoft on the ActiveSync team — we actually were trounced by RIM not because our sync was worse (it was probably superior) but because we initially made the mistake of OVER-reporting status.

    Sync should be a silent, no/very little UI experience — a utility that just works in the background. Any attempt to make it more than that will cause the product to fail miserably.

    • @Garry Tan I totally agree that users should not need to know that we’re solving these problems for them. And ideally no UI is what you’d like but when people rely on the sync product to push a file onto a device before they take the road, they also want to know that it made it there before they can turn off their computer. So some feedback is needed. File sync is also expensive in bandwidth and in CPU and you need to account for that to the user. It doesn’t have to jump at the screen but it should be here for reassurance to users who do want to know.

      Again- syncing files just makes it more likely to run into any of the problems that you may have when you sync PIM data. Indeed you can assume that PIM-data sync is almost always carried through and through; but file sync very often is interrupted (because transferring files across the network is slow!) and people will then wonder where there data is. Communication is important in that case.

  8. We solved this problem by doing away with sync altogether. Version control is a far better way to go. We have developed a way to version contacts and tasks, therefore eliminating the need for sync between Outlook and the cloud. We have also integrated this into our new iPhone application which is being sent to Apple this week. Please check us out at

    • @Dan Cornish Version control is actually a very interesting approach. It’s a mature technology, it’s very safe with a very granular control left to users. But it has its limits as well: it pushes the complexity of dealing with conflicts onto the users. While version control is perfectly adapted to advanced users, it isn’t a great fit for people who are mostly looking for a seamless productivity tool.

  9. Richard Farleigh

    You can solve this all quite easily.

    Don’t have multiple places to store your data, just have one cloud/server type storage that all your devices can access.

    Eliminate local storage and store everything in the cloud.

    • Shane

      To go along with the trust and privacy issues associated with all your data “in the cloud”, there is also the real problem that bandwidth, whether it be cable/DSL, 3G, 4G, etc, is nowhere near ready to replicate local load times, particularly as it related to working with images and video.