Blog Post

Microsoft Wants to Build Its Business With Data

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Everyone likes to talk about big data, but few know how to make use of it. Thanks to cloud computing and the efforts of several companies, however, the ability to access and make sense of huge chunks of information is here. The question is whether there’s a business in providing intelligible data sets to information workers, application developers and analysts in a world where turn-by-turn directions and real-time financial quotes — which used to be expensive — are now free.

Microsoft (s msft) is hoping there is, and to that end has built out a storefront for data sets that range from geolocation data to weather information that’s codenamed Project Dallas. The project, which will become commercially available in the second half of the year, aims to provide access to data from information providers like InfoUSA, Zillow and Navteq so that developers can use it to build applications and information services. Other potential users of the information are researchers, analysts and information workers — from buyers at retail stores to competitive intelligence officers at big companies. Microsoft will take a cut of the fee charged by the information providers, but Dallas isn’t about profiting from data brokerage so much as it’s about showcasing Microsoft’s Azure cloud and making its Office products more compelling.

“The indirect monetization is potentially bigger than the direct monetization,” said Moe Khosravy, general product manager of Project Dallas, in a conversation last week. “That will cover some bandwidth and compute and the credit card surcharges for the transactions, but the real opportunity is that more developers will use Azure and Office because we’ve made it easy and will build support for Dallas into Office.”

I explore Microsoft’s efforts as well as those of a startup called Infochimps, which is also building a data marketplace, in a research note over on GigaOM Pro (sub req’d) called Big Data Marketplaces Put a Price on Funding Patterns. In it, I lay out how the ability to host and process large data on compute clouds has changed the way people can access and profit off of data.

And while I spend a lot of time in the research note talking about business models and how to charge for data by the slice, Infochimps and Microsoft will both provide some data for free, much like (s amzn) and a startup called Bixo Labs are doing. Specifically, Khosravy said Microsoft may try to provide some municipal and federal data as a public service — or at least refrain from charging the governments from hosting the data on Azure.

Figuring out how to get public information on data marketplaces is difficult. Governments have a lot of access to data, but it’s generally on paper or in old databases that may not translate automatically to the cloud. There’s a clear public interest in providing that data in a clean format for developers and citizens, but the costs could quickly add up — and governments don’t tend to have a lot of taxpayer dollars floating around to transfer their data to the cloud. That’s why Microsoft’s volunteering to host “a percentage” of public data for free might help.

And the benefits of such easy accessibility and the ability to mash up different data sets could be huge. As an example, Microsoft is working with the City of Miami on a new 3-1-1 line that uses mapping data and inputs from the city’s existing 3-1-1 hotline to create a map of where potholes and street problems ares so city officials can tackle the issues in an organized way.

As data marketplaces grow, questions about who owns the data and privacy issues will get resolved, because the financial incentive to address them is huge. Then folks can focus on what they can build using huge swaths of demographic, geographic, financial and even personal data. Read my full analysis.

4 Responses to “Microsoft Wants to Build Its Business With Data”

  1. Stacey, your post and report provide a very timely and useful discussion of important developments. As you point out, through AWS and other sources, the tools and infrastructure with which to capture and analyze Big Data are now available to virtually anyone, not just the Googles and Yahoos of the world. Still, I would estimate that 99.9% of the data generated go out the “data exhaust” and are not currently being examined, much less mined at a granular level. The emergence of a “big data marketplace” creates new opportunities for data owners and developers alike. In addition to privacy and other issues you note, two others need to be considered: (i) data, particularly from disparate sources (as is typical in mashups), are often “messy” and not easily linked – consider, for instance, the many different ways in which location can and is represented. As a result, disambiguating and joining data at the entity level from disparate datasets are critical tasks – however, few “Big Data” solutions are very good at either; (ii) latency – like wine, data age. Unlike wine, it may be important to continuously compare the “newest” data with historical data, particularly if the objective is to detect and respond in near real-time. Organizations and users should carefully consider both the quality and currency of the data as well as the ability of the tool(s) they’re using to accommodate and disambiguate data from disparate sources. Some of these issues will be addressed in the session on Big Data that I’m moderating at the upcoming GigaOm Structure Conference (June 23-24;
    Dr. Phil Hendrix, immr