Blog Post

Under the covers of eBay’s big data operation

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

For online auction powerhouse eBay (s ebay), big data is serious business. The company has 100 million active users globally, 300 million live listings at any time (and it archives them all), receives 2 billion page views daily, and handles 250 million search queries and 75 billion database calls a day. How does eBay make sense of all this activity? With Hadoop, of course.

What a customer (or engineer) wants

Hugh Williams

Hugh Williams is VP of experience, search and platforms at eBay. His team is responsible for the entire eBay experience from the moment users hit the site until moment they make a purchase, from code to data center automation to building new picture-hosting platforms. If it has to do with driving traffic to eBay and improving the customer experience, Williams’ team builds it. But in order to know what to build and how to build it, the team needs insight into what customers want and what they’re doing.

In order to figure this out, eBay first has to give its analysts and engineers the tools they want. It does this by operating a two-pronged big data attack consisting of a massive Teradata (s tdc) data warehouse and a fast-growing Hadoop environment.  Financial analysts like SQL and more of a WYSIWYG experience, Williams said, which is why Teradata is so important. However, the majority of his engineers love Hadoop — which stores and processes unstructured data such as server logs, click-throughs and search queries — and make “enormous use” of it.

Huge data

Whichever one you’re talking about, Williams says eBay’s traffic volumes produce huge data, not just big data. In late 2010, eBay predicted its Teradata deployment would grow from about 10 petabytes to 20 petabytes (or 20,000 terabytes — equivalent to about 266 years worth of HD video) within a year. Its Hadoop environment is currently storing between 9 and 10 petabytes, according to Williams, but always growing. In fact, the Hadoop environment doubled in size in the past year, in part from more user data streaming in and in part from analysts running lots of Hadoop jobs and creating new, larger data sets that also remain in the system.

“What we really use Hadoop for is to understand our customers and their needs,” Williams said. This happens both at a broad scale — say, improving the accuracy of its search engine — and also more narrowly around building specific features the data suggests customers would want. For example, Williams explained, Hadoop has proven helpful in deciphering patterns of misspelled words, so now eBay’s search engine knows to look instead for an actual word or product when users type certain queries incorrectly. In the middle, between broad improvements and narrow data-driven features, Williams said Hadoop helps eBay find out a lot about how it’s different and how it can become more unique by letting Williams’s team churn through those petabytes of unstructured data to uncover trends.

More than MapReduce

Beyond Hadoop’s sweet spot as a batch-processing engine using its native MapReduce framework (i.e., processing large data sets) Williams said eBay is also expanding its own Hadoop usage rather heavily into HBase, the NoSQL database that’s also an Apache Software Foundation project and leverages the Hadoop Distributed File System. HDFS, which is the default storage layer for Hadoop, also serves as the storage layer for HBase, which doesn’t process data like MapReduce but lets users quickly read from and write to large unstructured data sets.

HBase is already a piece of eBay’s new search engine, and Williams said there are few sites using it in production at eBay’s scale. Facebook is another site already making major use of HBase. Williams said HBase is fantastic, but it’s also the area within the Hadoop ecosystem where he’d like to see the most improvement. It’s fundamentally real-time, he explained, which is great, but eBay had to do a lot of work to make HBase scale and to make it fault-tolerant. Build a self-healing system out of Hadoop subprojects was very challenging.

Actually, Williams is generally excited about NoSQL, which refers to non-relational database technologies, as a way to handle eBay’s high traffic in data not necessarily ideal for traditional databases. “Cassandra and MongoDB are other great examples of the latest, innovative technologies for managing large data sets that we’re excited about at eBay,” he said.

Open source all the way … probably

For all its benefits, Williams acknowledges Hadoop can be a tough technology to learn, but any blood, sweat and tears are worth it to ensure his team really understands the data platform that underpins so much of eBay. “[T]o put it to its full potential, we have to be experts in it,” William said — a level of expertise that can really only come via open-source software that lets engineers “roll up [their] sleeves and [get] into the source code.”

Still, any sort of decision is the result of collaboration between the business team and the technology team, so Williams says he keeps an open mind as to how eBay’s big data environment might evolve. Right now it’s Teradata and Hadoop, but “I can imagine that landscape changing,” Williams said.

In October, we covered comments from eBay Senior Director of E-commerce Darren Bruntz, who said he would like to move to a single data platform and that he’d like to see “more focus and energy” from the Hadoop community. Asked at the time about whether such a platform is possible, Teradata Labs President Scott Gnau told me it’s not possible now — at least if you want all the advanced SQL analysis features of a product like Teradata for structured data — but that it might be in the future.

And although Teradata now has a product in Aster Data Systems that is something of a replacement for Hadoop, Gnau said “Hadoop or son of Hadoop or something else” will always be a big piece of the big data space because it has so much momentum and such a sweet spot around search and batch processing of unstructured data.

EBay’s Williams, though, maintains the sentiment of his team members will remain a major factor in any decision regarding the company’s data platform. “For a new platform to succeed, our technologists would have to be passionate about the platform, and the platform would have to enable us to innovate faster to build products for eBay’s customers,” he said. “If a new technology helps us achieve that goal, we would certainly evaluate the benefits.”

We’ll be talking a lot more about Hadoop, NoSQL and where they’re headed at our Structure: Data conference, which takes place March 21-22 in New York City. Speakers include some of the biggest names and brightest stars in the space, all of whom are trying to push the limits of what organizations can do with all the data they collect.

6 Responses to “Under the covers of eBay’s big data operation”

  1. As far as buying/saving $ and eBay goes:

    If you send the seller a question about an item, find another of their listings, and send the question from that item page, rather than from the one that you actually want. This will add a little bit of work for the seller, if they want to add the question/answer to the item description page that you are actually interested in.

    If you see an item that you want listed in auction format, send the seller a message asking if they will accept $x to end the auction early and sell the item to you. May be telling them that they would not have to wait as long to get their money (they would probably know that, but it still might help). If that does not work, use a sniping service such as to bid for you. It’ll bid in the last few seconds, helping you to save money and avoid shill bidding.

    Use a site like to set up saved searches. You’d get an e-mail whenever a match is listed. Especially good for “Buy It Now”s priced right.

    If the item that you are looking for is difficult to spell, try a misspelling search site like to hopefully find some deals with items that have main keywords misspelled in the title. Other interested buyers might never see them. Then, if the item is listed an auction format, after a few days of no bids (hopefully anyway) send the seller and offer to end the auction early and sell the item to you. They may worry that no one is interested, and take whatever they can get.

  2. Philip Charles Cohen

    “When Do We Start Calling eBay A Payments Company?”

    A picture is worth a thousand words, so they say. This linked “Business Insider” article contains a graph of eBay revenues since 2003. It shows quite starkly how eBay’s Marketplace revenue has stagnated since 2008, about the time that the headless turkey from Bain & Co, John Donahoe, got hold of the tiller and started his “destructive renovations”, and eBay’s share price has moved little in the same period; ergo the eBay Marketplace has effectively been in decline since 2008.

    It should be obvious, even to the simplest of analysts, that as time passes, the Amazon River flows ever more strongly, whereas the eBay Creek now consists of only a line of stagnant ponds covered in slimy green algae—and isn’t that a couple of rusting Chinese-made shopping trolleys that I can see dumped therein?

    The graph also shows the eBay-underpinning increases in revenue eBay has received from PreyPal during the same period, that is, from roughly when the “eBafia Don” effectively mandated PreyPal’s use on the eBay Marketplace. Some analysts think then that eBay’s future lays in PreyPal.

    Well, if anyone thinks that the retail banks are going to let such a clunky, parasitic, flea-sized, upstart, middleman, “merchant of sorts” such as PreyPal—who after all does no more than ride precariously on the back of those banks’ own payments processing systems—continue to nibble away at one of the banks’ principal areas of business for any length of time, all I can say is, dream on …

    PreyPal is little more than a clumsy, fraud-enabling middleman that nullifies the statutory protections that usually apply to the use of the likes of Visa/MasterCard.

    Then there is PreyPal’s current testing of “mobile payments” at POS in Home Depot stores. What most worries me is, are people actually leaving their funds “on deposit” with this clunky, unlicensed, prudentially unregulated, PayPal “non-bank” that is itself not even licensed to provide credit? Otherwise, how are the funds for such mobile payments being sourced in any dynamically guaranteed way from the payer’s real bank? Hopefully, Not with the normal non-guarantee of payment that PreyPal serves up to its online merchants, I hope.

    And, unfortunately for eBay’s chief headless turkey, Visa’s professional online offering “”, when it is up and running later this year, should put paid to whatever success that the clunky PreyPal has had with online merchants outside of its mandated use on the eBay Marketplace—and soon thereafter both these unscrupulous and clunky entities should commence their long-deserved journeys down the gurgler.

    Scott Thompson saw the writing on the wall; John Donahoe remains delusional, that fact confirmed by the many reported sightings of him waving his mobile phone about and mumbling about UFO sightings over San Jose.

    Scott Thompson abandons the struggling eBay for the struggling Yahoo

    “How secure is PayPal for sellers?”—UK “Guardian”

    And an interesting follow up to this UK “Guardian” article at:

    PayPal claims PayPal not a debit card or payment network!

    eBay / PayPal / Donahoe: Dead Men Walking