Websites are often full of good, structured data. It’s just buried in, well, a website. Unless you’re going to manually enter that data into a spreadsheet or learn enough programming to build a web scraper, all that info on homes for sales, job listing or whatever else suits your fancy isn’t going anywhere.
That’s the problem London-based startup import.io, one of the 10 startups competing next in our LaunchPad contest at Structure: Europe (Sept. 18 and 19 in London), is trying to solve. “We allow people to turn websites into data,” explained Co-founder and Chief Data Officer Andrew Fogg. “… For us, the technology is all about people who don’t know how to code but still need data.”
At its essence, import.io’s service lets users train what Fogg calls a “data browser” to learn what they’re looking for and create tables and even an API out of that data. In the (eerily silent) demo video embedded below, for example, someone wants to get data from a realty website about the various attributes of homes for sale. The user dictates what attributes will comprise the rows and columns on the table, highlights them, and import.io’s technology fills in the rest.
Import.io also lets users record queries so they can get to the desired data on websites that require clicking or entering something into a search box in order to get anywhere, and it can handle multiple pages of results. Users can also bring together data from multiple sources into one place so they don’t need to manage multiple outputs.
The service might be useful for a one-off scrape of a website to gather the data you want, but the creation of an API is what’s really valuable. That lets users tie connectors, as import.io calls them, right into their own applications. Fogg cited one example where the British Red Cross was to get data from hospital websites onto its mobile app for people to search, and another where a laptop manufacturer was able to track prices of its computers across all its partners’ websites to ensure they didn’t fall below minimum price thresholds.
Some journalists use import.io to monitor county court websites where many legal matters for specific companies flow through, or certain websites that don’t have RSS feeds but are regularly updated with new content. I sense there’s a potential use for investigative journalism, too, whether that’s a one-off task to format data so it can be visualized or an ongoing effort to monitor specific sites.
The most illustrative example of how import.io’s business might evolve (its service is free, although presently in developer preview mode) could be work it did on behalf of executive-recruitment firm Robert Half International. The firm wanted to proactively monitor its clients’ websites and other job boards to see when new job listings posted. Even Fogg acknowledged that building thousands of connectors with import.io would be a lot of work.
So the company outsourced the creation of 4,000 connectors to a team in India. That cost only $1,000, Fogg said, adding that a one person relatively skilled with import.io could create one connector every four minutes, or about 100 during an eight-hour workday. That’s compared with about four hours to create just one web scraper, Fogg estimated, and import.io doesn’t even require those outsourced workers to have coding skills.
I’m a big proponent of better tools for finding and producing data as well as for just analyzing it, and import.io seems valuable for this purpose. I plan to experiment with it in the near future, and I expect data scientists aiming to build applications in the image of something like LinkedIn’s University pages will experiment with it, too (if they’re not already). Aside from the LaunchPad contest, Fogg will also be on our data science panel at Structure: Europe, so expect him to address import.io’s data science implications then.