Jeffrey Dean, the man who developed or co-developed some of Google’s biggest infrastructure innovations — such as MapReduce and BigTable — told attendees at GigaOM’s Structure conference in San Francisco that the best approach to infrastructure is to focus on one problem at a time. Google was forced to come up with its own software and hardware solutions, Dean said, because it was growing so quickly and had such huge data needs, and this helped it to focus on the important problems that needed to be solved right away and to come up with some innovative answers.
MapReduce, for example, came about because the company needed software that would scale, that would be robust and that could also run “across as many machines as we wanted to throw at the problem,” he said, and that led to designing a system that would allow for scalable abstractions. In a similar way, the company’s BigTable database software came about because Google had a lot of datasets with a number of different attributes — such as its web-crawling index of URLs, combined with what language the page is in, its PageRank etc. — and needed a better way to manage them.
A more recent example of designing something to fit a specific problem, Dean said, was the development of what Google calls Spanner, which is software that allows programmers to replicate data across all of the company’s data centers — and to specify where exactly they want copies of that data to be stored, so that they can work on it more effectively. “So we have one global namespace for data and you can specify how you would like that replicated at a fairly high level,” he said. “You could say you want two copies in Europe and one in North America, and so on.”
Dean added that with any infrastructure problem, “there’s always this tension… you could try to solve all problems for all people, but that usually ends up not being good for anyone.” So the best approach, he said, is to focus on one problem and work closely with the team that has the most need for what you are building. Newer companies also have the luxury of being able to use Amazon’s AWS and other cloud services to scale, he said, instead of having to create those physical resources themselves.
As for problems he is working on right now, Dean said he has been focusing on building machine-learning systems that are “biologically inspired,” in the sense that they are built layer by layer — so an image-recognition system has been built with layers that do simple things like recognize an edge or a corner, and eventually progressed to higher-level abstractions. Dean says the system can now recognize images that contain cats without ever having been taught what a cat is.
Check out the rest of our Structure 2013 live coverage here, and a video embed of the session follows below:
A transcription of the video follows on the next page