In 2004, Tim O’Reilly’s famous Web 2.0 manifesto suggested that “data would be the next Intel Inside,” and that any Internet service of significance would be underpinned by specialized datasets, such as Amazon’s (s amzn) product database or Foursquare’s places.
However, although we’ve seen online office suites added to the portfolio of web worker productivity tools, database apps have been curiously absent from the mix. Even suites like Google Apps lack a dedicated application for managing, publishing and sharing specialized data, leaving users creating crude spreadsheet-based approximations. The average web worker may not have as much need for an online equivalent of Access as they would of Excel, but it seems strange that a collaborative database tool is missing from online apps suites like Google Apps (s goog) and Microsoft’s Office Web Apps (s msft).
Fortunately, a new generation of tools are providing just that kind of functionality. “Data-as-a-service” providers are emerging that are enabling users to create, manage and publish specialized datasets, providing both authoring tools and opportunities to participate in a web of data, not just of pages.
When Factual launched a few months ago, I wondered if it was a “Flickr for data.” Indeed, the company pitches itself as an “open data repository” where users can upload and create datasets, as well as add data hosted by Factual to their own sites and apps.
Factual currently hosts datasets as diverse as videogame cheats, hiking trails and U.S. presidents. Interestingly, each dataset also includes a history of changes, which provides a level of accountability.
User can create new datasets by importing files, parsing web pages or using Factual’s extraction tools. Data can be accessed manually through a browser or via a public API.
InfoChimps is similar to Factual in many respects, but positions itself as a “data marketplace” that enables publishers and owners of datasets to charge for their usage. Publishers can offer free and paid datasets, charging either for API access or for making them downloadable.
Interestingly, some datasets are organized into collections from particular organizations, such as Wikipedia and Data.gov, indicating the InfoChimps has become a useful means for organizations to outsource management of their open data policies.
It’s no surprise that Google (s goog) is also experimenting in this area, though the company’s Google Squared service takes a slightly different approach, enabling its search data to serve as a source for creating smaller subsets ot data.
Squared can take any set of Google’s search results and format then into a structured table, or “square” which can be edited as well as exported to CSV or Google’s Spreadsheets format. The example here illustrates how a search for “british cities” creates a two-column square that can be expanded with additional user-created columns (i.e. population size) and saved for use elsewhere.
Where is the “YouTube for Data”?
Though Factual, InfoChimps and Google Squared show promise and the value of a web-based database authoring and hosting service, the demise of solutions such as Swivel and the defunct Dabble DB indicate that this application category is yet to mature. Perhaps it will take a company of Google’s scale to offer a “data-as-a-service” application that can truly rival the tools we’ve long used on the desktop and, more importantly, to enable us to share and monetize that data in ways which we haven’t been able to, thus far.
Do you have unique datasets in your business that could be valuable to others?
Related content from GigaOM Pro (sub. req.):