Food and data, it turns out, have a lot in common. Eating a meal is easy, and so is consuming pre-prepared data. But anyone who has ever had to cook a meal from scratch or build a big, usable database probably has a better appreciation of the work that goes into creating those finished products. Consumers might take them for granted, but chefs and database architects don’t.
The team at Food Genius certainly doesn’t. The Chicago-based startup has built a massive interactive database of restaurant menus across the United States — by price, ingredients, preparation, descriptive terms and more — but it wasn’t easy. It’s hard enough to build accurate, searchable datasets in normal fields. In a field as complicated as the culinary industry, it’s a near-Herculean chore.
Whatever industry or topic someone is trying to enlighten with data, though, it might make sense to think about product development like developing a new recipe. Imagination is great, but what’s available, what’s possible and what’s advisable will have a big impact on the finished product.
Here’s a step-by-step guide to building a data-based product from scratch, informed with insights from Food Genius Co-founder and CEO Justin Massa.
1. Figure out what dish you’re gonna make
Building a new data product is lot like creating a new recipe if you’re a chef. It might seem simple, but there’s a lot more to developing a new menu item than just deciding to make a Korean-fusion burrito and throwing some kimchi in a tortilla. And there’s a lot more to building a data product than just throwing some data in a spreadsheet. (Done right, by the way, kimchi in a burrito is delicious.)
For starters, you have to make sure the idea is something people actually want or will want again once you convince them to try it. If it’s an entirely new product, you have to figure how to present in a way that will be both visually appealing and useful (or tasty). If it’s a new take on an old standard, you have to figure out a new spin that actually adds to the meal (like a recruitment tool that proactively surfaces candidates based on social network activity) rather than relying on extraneous novelties (as in, “Our health care app recommends diets based on what’s hot among Twitter users.”).
For Food Genius, the idea behind its database was something entirely new: providing insights into the business of restaurants by letting users figure out how items are prepared and presented in restaurants, and described on their menus.
2. Figure out where to get the main ingredient
You can’t make a dish without food, and you can’t build a database without data. Once you have an idea, the next step is figuring out where you’ll get your data.
Think about it like a chef coming up with a signature dish at a new restaurant. He could go in any number of directions, but the best choice is one that fits nicely with the overall theme and, ideally, that’s relatively cheap and easy to keep in stock. If it’s a vegetarian dish, maybe he grows his own vegetables. If it’s beef and the restaurant is nowhere near a cattle ranch, perhaps he just decides to go with a commercial distributor.
The choices are similar when it comes to trying to find the right data. Companies with active users and behavior or transaction data (think Spotify’s data on what users listen to or the Gap’s data on what people buy) probably base any of their efforts with what they’re generating in-house. An individual developer or a startup with little more than idea might have to look elsewhere — perhaps at the myriad open datasets available online or via API access to services such as Twitter — to find something that provides the data their service will depend on, or at least act as a reasonable proxy for it.
In Food Genius’s case, it needed data about what’s on restaurants’ menus. The company could have scraped the web or tried to manually enter data from paper menus, but it had a better idea. As Massa explained, his company partnered with online-ordering service GrubHub to get access to its menu data — a deal that now gives it access to the AllMenus database, as well — and with MenuPages. That’s lots of good data and relatively little work to get it.
3. Figure out where to get the other ingredients
It’s not often someone leaves a restaurant proclaiming a piece of unseasoned, unaccompanied fish as the best meal of her life, and the same holds true for data. It’s the sauces and the spices that make a dish stand out, and it’s the additional data that makes a service better.
Sometimes, that’s because the extra data is actually necessary to build the product, and sometimes it’s just a great addition. This is in some ways what spurred much of the data science movement over the past decade, as web companies sought to fill in the holes in their foundational data (e.g., search activity at Google) by finding other avenues to get a holistic view of their users.
In the case of Food Genius, it’s not yet trying to predict anything about the menus it parses, but it does want to supplement that data with useful information. For example, Massa said, Food Genius shows users menu items at local restaurants related to the food items they’re investigating, but it’s data from Factual’s API location data service (which includes more than 1.3 million restaurants) that helps Food Genius offer up a lot more information about where those business are located, how much they cost and the URLs for their websites.
Food Genius also pulls in data from recipe sites in order to show users a sampling of dishes they could make based on the types of food they’re interested in. One user, Massa noted, was analyzing data about panini sandwiches when the recipe widget inspired him to spice up his menu by using his panini press to make waffle cones.
4. Get busy with the prep work (this could take a while)
Okay, so you have all your ingredients — you even have the plating figured out — and now you’re ready to dive headfirst into cooking, right? Wrong. Vegetables need to be cut, that fish needs to be scaled and filleted, and the counter needs to be cleaned. If you’re running a large kitchen, the whole process needs to be fast and repeatable.
It’s the same thing with data, which, unless you’re using a service like Factual that prides itself on clean data, usually doesn’t come ready to consume. The process might be relatively simple if you’re using nice, relational data and attacking a field with accepted metrics and attributes, but otherwise it can be a bear. This is where extracting, transforming and loading comes into play (yay for Hadoop!), as well as everything else that goes along with turning raw data into usable data.
The problem is that you might have to find a common way to represent different types of data, from different sources, about the same general thing. We’ve covered the difficulties in other areas, such as the SumAll Foundation trying to develop key performance indicators for non-profits, and a DataKind volunteer having to turn administrative data about New York City’s trees into a model for predicting the benefits of tree-pruning efforts.
Food Genius’s problem was arguably hairier still. Restaurants don’t follow any specific standards when it comes to putting together menus, Massa explained, and there are no widely accepted delineations between, say, quick-service, casual and fine dining. When it comes to analyzing various dishes, Food Genius has to be able to discern between, for example, between “buffalo” as a type of meat and buffalo as a preparation method for chicken. It has to understand that “hamburger with fries or salad” is both a meal as well as a collection of distinct menu items, and that the or makes the with fries just an option.
“The data about menus was never intended for this purpose,” Massa lamented.
And just how do you categorize a seemingly endless variety of different foods, menu items and preparation styles, as well as their myriad aliases? Massa said the lion’s share of normalization work goes into Food Genius’s 23,000-word dictionary of food terms and relationships. It’s arranged hierarchically (e.g., from “protein” down to “salami”), but it also gets the semantics of food — it lumps arugula and rocket together, for example, as well as frankfurters and hot dogs.
It helps to know something about the industry you’re working in, too. Although corn is technically a grain, most food and menu planners consider it a vegetable, so Food Genius does, too, Massa said.
5. Cook the darn thing!
It’s only after all the prep work is done that you can finally get down to cooking. If everything tastes and looks good, the recipe is a success. If not, it’s time to add some more spice, work on the texture, or maybe scrap what you have and start again with fewer ingredients.
At this step, building an application based on data might not be too much different than building any other web application. If the problems are just a matter of poor design or the need for some additional features, that’s probably remedied easily enough. But if a product isn’t accurately analyzing users’ data or surfacing the right content, there might be some deeper, non-aesthetic algorithmic work to do.
If you’re building a database like Food Genius and users can’t make heads or tails of your data structure, that might require a fundamental redesign of the whole project. For example, if its index conflated preparation styles (e.g., fried or grilled) with sensory terms (e.g., spicy or crispy), and then labeled the whole category as “Style,” that might be a little confusing. As it turns out, though, it’s actually pretty easy to find what you want and to make sense of how the data is presented.
6. Keep it fresh (without going overboard)
Times change, and so do dishes and menus. A little variation is a good way to keep the chefs, staff and diners interested, as well as to fend off competition from that new, hot restaurant down the street.
This might be even more true on the web, where there are always new sources of data popping up that could be useful, and where there’s always a new startup — or 10 — putting its own spin on your idea. That’s why the Googles, Facebooks, Twitters and LinkedIns of the world are always adding new products and features. Often, these features involve new ways of presenting the data the companies have gathered about users, which is a major source of distinction from one platform to another.
Food Genius knows it can’t remain a database forever, and expect to thrive in the long run. That’s why, Massa said, it’s already planning its next few iterations. First up is surfacing trends in menu items over time, followed by understanding consumer behavior and then being able to predict the next hot (or not) ingredients. The work on consumer behavior might be the most interesting, if only because visualizing trends could prove relatively easy, while Massa acknowledges that any predictive algorithms will suffer from overstating a limited amount of data until Food Genius can gather years worth of reliable data on which to train its models.
Plus, he added, there’s a need for better data on what consumers are doing. He cites a study from a firm called NPD as being the gold standard in analyzing consumers’ behavior in restaurants, even though it only surveys only a few thousand people. “You want to understand behavior in restaurants? The best you can do right now is a survey of 5,000 people,” Massa said.
Right now, the focus of Food Genius’s consumer behavior work is on extracting the demographic data its customers want and on figuring out when people eat certain things. He thinks data like what location-data startup Placed is collecting about where consumers are going could help generate some of that valuable demographic data. Another avenue to get this info might be an aggregation of census data, sales data and perhaps Foursquare data around checkins. The latter two could also help Food Genius figure out if certain restaurants, dishes or types of food over- or under-index among various groups or at different times.
7. Start selling it in supermarkets
Some might call it selling out, but folks like Wolfgang Puck and Emeril Lagasse probably call their lines of prepackaged goods a main ingredient of getting stinking rich as a chef. If your food is so good you can sell outside the restaurant and make money doing it, it’s probably not a bad idea to do it. At the very minimum, you should probably publish a cookbook.
With the advent of data sharing via API, this type of thing is becoming very common on the web, too. Everyone wants access to the Twitter API to power social features, and there’s nary an enterprise IT startup that doesn’t connect to the Salesforce.com API to pull in data from CRM service. Whether or not that API access is free depends on the companies involved, but either way there’s a lot of benefit that can come from becoming a platform by building an ecosystem around your data.
Massa acknowledges Food Genius isn’t quite at this point yet, but he does foresee a future where the company’s unique dataset could become very valuable to a broader audience. Whereas most ad exchanges today have around four categories around food, he explained, Food Genius can slice and dice menus into about 8 billion different categories. And although no one today probably wants to target ads against hundreds of thousands of unique consumer segments around eating preferences, they very well might want to a few years down the road.
A recent study by Google and Nielsen found that a significant number of search activity on mobile phones is about food and restaurants, and the majority of people searching for restaurants either clicked on a link or visited a store a result of their searches. Thirty percent actually converted (made a purchase), and more than half of those conversions from those searches take place within an hour. Someone will eventually figure out how to exploit these behaviors with micro-targeting and when that happens, Massa said, “We will promptly be there to be able to power and target that kind of stuff.”
Feature image courtesy of Shutterstock user wavebreakmedia.