Treasure Data has taken Hadoop to the cloud, (for now just Amazon Web Services) and built a data warehousing service on top of it aimed squarely at disrupting the big, expensive data warehouse boxes sold by Oracle, IBM, Teradata and the like. Fortune 500 companies are taking note. According to Treasure Data it has more than ten enterprises in production on its service including a large retailer and giant auto-maker. Startups including ContextLogic, Cookpad and Splurgy are also customers.
Founder and CEO Hiro Yoshikawa says it’s not Treasure Data’s technology that gets the attention of CIOs, although it helps that the team are Hadoop pros. He says they are more interested in the economic story and the fact that Treasure Data is selling a service. Traditional data warehouses typically cost millions of dollars just to get started, whereas Treasure Data has a freemium pricing model. Users can host up to 500Gb of their data on Treasure Data at no cost. Upwards of that, the service costs $1500/month for 8 guaranteed TDCU cores (based on AWS ECUs).
Secondly, it’s a service, meaning it’s way faster to use than buying an Exadata, or trying to learn Hadoop, neither of which is particularly appealing to IT pros. Treasure Data claims it beat Microsoft Azure and IBM Netezza on a deal with a major car manufacturer that is collecting and analyzing sensor data from its cars. The car-maker wanted to know the average distance travelled by its vehicles from the moment the key was turned on to when it was turned off. Treasure Data was able to turnaround an answer in less than 5 days beating its competition. The company says this was only possible because they are based in the cloud and can scale fast.
There are other hosted Hadoop services, but Treasure Data goes several steps further, providing a data collection agent, fast analytic capability, access via SQL and visualization tools, rounding out the service. It also created its own columnar database to replace HDFS. There are other open-source columnar databases out there such as Infobright and InfiniDB, but none that met the mark for scalability, availability and durability, according to Yoshikawa. He wasn’t joking when he said they had distributed architecture skills.
There are other cloud-based analytics services too, like Google BigQuery, but Treasure Data says its improved on this too. Google does not address the data collection aspect (which Treasure Data does with its TD-agent). And Google BigQuery users have to fix the data schema upfront whereas Treasure Data provides a flexible schema change capability after data is imported.
Despite all these innovations, the startup will face the usual uphill battle of convincing enterprises that their data is safe in the cloud, still the number one reason why companies won’t use cloud services.