Blog Post

Why Apple, eBay, and Walmart have some of the biggest data warehouses you’ve ever seen

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

In an age of Hadoop and a general analytics revolution, it’s easy to poke fun at legacy data warehouse vendors such as Teradata. Some people might even call it fun. After all, they sell expensive appliances and weren’t built from the ground up to handle the unstructured data that most people think of when they think of “big data.”

But whatever you think about Teradata’s approach to handling big data workloads, make no mistake about the company’s clout: It has been around for decades, and it’s still analyzing boatloads of data for some of the biggest names in business. I spent a day in February touring the Teradata Labs facility in San Diego, and although I heard all about the technology and the company’s vision for a Teradata-Hadoop-Aster analytics super-environment, the thing that stuck out most were the users. Walmart, eBay, Continental … Apple.

Here’s how they’re all using Teradata and at what scale (try not to faint when you think of the bill):

  • Apple: Apple (s aapl) is operating a multiple-petabyte Teradata system (that became apparent during its iCloud launch in 2011) and, I learned, was Teradata’s “fastest ever customer to a petabyte.” Apple uses the data warehouse to get a better understanding of its customers across product groups. Now every piece of identifiable information — and those iTunes interactiona generate a lot of data — goes into the system so the company knows who’s who and what they’re up to.
Rows of Teradata appliances.
Rows of Teradata appliances.
  • Walmart: The retail giant deployed Teradata’s first-ever terabyte-scale database in 1992, and it has grown, uh, a bit since then. Its operational system was at 2.5 petabytes as of 2008, and is certainly leaps and bounds bigger by now — likely well into the double digits when you consider it operates separate ones for Walmart (s wmt) and Sam’s Club as well as a backup system. The analytics efforts have essentially helped Walmart become a massive consignment shop. It tells suppliers, “You have three feet of shelf space. Optimize it.” And then it gives them any data they could possibly need to determine what’s selling, how fast and even whether they should redesign their packaging to fit more on the shelves.
  • eBay: eBay (e ebay) has two systems in place, and they’re both big. Its primary data warehouse is 9.2 petabyes; its “singularity system” that stores web clicks and other “big” data is more than 40 petabytes. It has a single table that’s 1 trillion rows. Yes, this is smaller than the 50 petabytes worth of Hadoop capacity eBay added last year, but Teradata is quick to point out that all of its systems support data into and out of Hadoop, so it’s not as if eBay is operating two entirely distinct data environments.

Of course, Teradata has lots of other petabyte-scale customers, with Verizon, AT&T and Bank of America among them. Here are a few more interesting use cases:

  • Harrah’s (now part of the Caesar’s Entertainment casino empire) understands how much money particular gamblers can afford to lose in a day before they won’t come back the next day.
  • Disney (s dis) is rolling out new bracelet tickets equipped with GPS and NFC that track everything visitors do while inside Disney’s amusement parks. The New York Times detailed the privacy implications of this move in a January article.
  • A manufacturing customer generates 20 terabytes of data per hour while testing products, although that volume is ultimately reduced to about 1 terabyte after the valuable data is filtered out.
  • At some point, Continental Airlines decided it wanted to keep its customers happy and began assessing them by lifetime value (which, it turns out, is often inversely related to frequent-flyer status) and began making alternative arrangements for them as soon as the airline realized flights would be delayed.
  • A luxury car company used Aster Data to analyze the pattern of failures for various components inside its cars. It found out that lighting, seats and infotainment often failed together (they’re on the same circuit) and began inspecting all three when a customer comes in for service on any of them.


None of this means Teradata is destined to continue being a huge name in analytics (Scott Yara, co-founder of rival EMC (s emc) Greenplum, recently called data warehouses this generation’s mainframe), but it’s still interesting to learn how big companies are analyzing their data, regardless what they’re running on. And with exabytes worth of data no doubt residing in customer systems across the world, Teradata isn’t going anywhere soon.

3 Responses to “Why Apple, eBay, and Walmart have some of the biggest data warehouses you’ve ever seen”

  1. Patrick Pitre


    You’re doing nothing to help the Big Data cause by making comments at the beginning of your article about making fun of legacy data warehouse vendors, with their expensive appliances. Those expensive appliances came at the demands of business users that want their reports (not necessarily data) as fast as possible. Traditional Data Warehousing doesn’t play in the same space as Big Data, and it shouldn’t. They are complimentary, and as the companies you’ve highlighted are doing, work together in a referential architecture. Big Data is not the replacement of data warehousing, and won’t be, until it can provide sub-second query response.

    • Derrick Harris

      I would say you’re right and wrong. Clearly, now, there’s not much of a comparison, but the big data ecosystem is putting a lot of effort into closing the gap. The end goal is analyzing data in where it sits, which could very well mean operational databases and Hadoop.

  2. Good article. Cool click thru on Disney. Singularity? I heard they got that name from Ray Kurzweils book. Also cool story!

    The data warehouse is like OLTP: its a force of nature, a concept that doesn’t grow old. The current renaissance of analytics blossoming is all goodness. But it doesnt stop the simple need to consolidate, clean, and integrate data for analytics. THAT future is 10-12 years.

    And mainframes continue to sell in the billions of dollars even though they supposedly died in 1993. Yara’s comment is marketing hype. When Yara/GP makes their first billion in revenue, then we might listen.