Deep learning is one of the hottest trends in big data right now and is currently underpinning the cutting edge in areas such as natural language processing and image recognition. Here’s a brief guide about what it is about who’s doing it. Read more »
Teradata’s CEO addressed the impact of Hadoop on its earnings call and, according to this report from ZDNet, downplayed its effect. In fact, he said only 4 to 8 percent of Teradata workloads might ever move to Hadoop. Even if that’s true for workloads, what about the data itself? It might not need to live in those pricey appliances.
Dropbox has hired Kevin Park as its new head of technical operations and IT. Park was at Facebook from 2006 until 2011, where he was a director of technical operations. This isn’t the first time Dropbox has brought on former Facebook employees to help grow its engineering team — in 2012 it bought a startup called Cove that was started by Aditya Agarwal (now VP of engineering) and Ruchi Sanghvi (formerly VP of operations), who built Search and Newsfeed, respectively, during their time at Facebook.
Correction: This post has been updated to clarify that Ruchi Sanghvi is no longer with Dropbox.
This is a pretty interesting benchmark study, although the headline is a bit misleading because Hadoop isn’t really optimized for graph analysis. When you look at comparisons to Spark, GraphLab and other platforms, it seems the decision of what to choose might come down to data volume, acceptable latency and cost, especially when considered against the value of that graph workload. Projects like Giraph and other YARN-enabled engines might make Hadoop look better, too.
It was another busy news day in the big data world. Here are some of the more-interesting items you might have missed if you blinked. Read more »
Cloudera is now positioning itself less as a company selling Hadoop and more as a company selling what it calls an “enterprise data hub.” It’s not about the technologies, the company argues, but what they can do for users. Read more »
An MIT professor has created an algorithm he says can work in conjunction with rangefinders and adaptive cruise control systems to keep cars moving at the ideal speeds to limit traffic jams. Read more »
Cloudera has partnered with a startup called Databricks to integrate and support the Apache Spark data-processing platform within Cloudera’s Hadoop software. Spark, which is designed for speed and usability, is one of several technologies pushing Hadoop beyond MapReduce. Read more »
Rackspace is now doing Hadoop, Cloudera just announced a handful of partners — Hadoop is everywhere in the cloud these days. Here’s a quick breakdown of what cloud providers are offering which distributions of Hadoop as managed services. Read more »
After nearly 18 months in relative stealth mode, ClearStory Data is finally available. It’s a pretty novel way of doing business analytics that tries to let lay users do more by automating much of the hard work. Read more »
A new startup called Paxata wants to make business analysts’ lives easier by automating the process of going from raw data to something that an analytics product like Tableau can actually understand. Read more »
Rackspace has opened its Hortonworks-powered Hadoop service for early access customers, about a year after announcing it would be building the offering. It’s neither the first nor the last managed Hadoop service we’ll see this week. Read more »
Backblaze CEO Gleb Budman came on the Structure this week to talk about everything from building open source storage pods to dealing with the CIA to how hard it easy it can be to waste $1 million marketing to the wrong people. Read more »
Cloudera CEO Tom Reilly says his company doesn’t really think of its peers Hortonworks and MapR as competitors, deciding instead to focus its efforts on winning bigger and broader deals. Read more »
Technology buyers in some sectors drool over the promise of things like cloud computing and big data, but those words don’t mean a whole lot in places like warehouses or manufacturing plants, where how something works is far less important than that it works. Read more »
Facebook’s decision to include status updates and wall posts in Graph Search could be called great or creepy depending on the user, but it wasn’t an inconsequential decision technologically. In fact, it put a significant strain on the feature’s database infrastructure. Read more »
I apologize if I’m late to the game on this, but someone just tweeted me about Apache Tajo, a potentially interesting new SQL query engine for Hadoop. I’m not sure how much traction it can possibly gain given the glut of other options out there (take a look at this now extremely outdated roundup from February), but I guess more options are better for users, to a point. SK Telecom, a Korean carrier, is already a big fan. Also, some of Tajo’s contributors’ employers are kind of interesting.
It’s product-announcement season in the big data and Hadoop world, and this week was full of them. Here are some of the more interesting items you might have missed if you blinked. Read more »
Airbnb open sourced a new tool called SmartStack that automates the management of the company’s various services. If they’re healthy, SmartStack facilitates communications. If they’re not, it knows and it takes them down. Read more »
Facebook has one of the largest, if not the largest, MySQL installations in the world, and has created a tool to keep that system online with as little human intervention as possible. It’s called MySQL Pool Scanner and, Facebook’s Shlomo Priymak wrote in a post on Monday describing it, it’s designed to automate “nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself.” Not only does MPS handle availability but, Priymak noted, it also lets administrators do things such as copy the entire Facebook dataset with a single command.
A Boston-area startup called Nutonian is taking its machine learning software originally developed for scientific research into the mainstream. To the degree software can ever tell users how and why data are connected, Nutonian says its product is the best thing around. Read more »
Streamy launched in 2007 and officially closed in 2010, but the social news reader application is back in a stripped-down version — not to make its founders boatloads of money on its own, but to prove the capabilities of their new Hadoop application platform. Read more »
Sales intelligence startup Infer is racking up customers and transactions for its software that analyzes thousands of data points in order to score potential leads on the likelihood they’ll convert. For some customers, the scores have become instrumental metrics. Read more »
Cooladata has raised a $7.4 million series A round from Greylock IL and Carmel Ventures. The company has developed an interesting behavioral analytics service that’s powered by Google’s various cloud services. Read more »
Google’s new real-time map showing DDoS attacks across the world is both awesome and scary — especially if you’re running a website in the United States. Read more »
Teradata is expanding out of the appliance world by offering a fully managed version of its data warehouse software as a cloud service. The company is also dipping a toe into the NoSQL world — the internet of things — with support for JSON files. Read more »
Greylock Partners’ Jerry Chen has seen enough in his recent 9 years building the cloud computing business at VMware to know how to spot an opportunity. Here’s where he thinks entrepreneurs and investors should focus their attention. Read more »
Google spent nearly $2.3 billion on infrastructure in the third quarter — nearly 50 percent more than last quarter and nearly three times the third quarter of 2012. Read more »
A new data-analysis startup called Mode has launched, thanks to a $550,000 angel round led by Yammer Founder David Sacks. Mode targets data analysts with a GitHub-like service for sharing their scripts, data and other work. Read more »
A new firm called Data Elite is trying to help accomplished engineers and scientists launch their startups. It features some prominent investors and advisers, and promises at least $150,000 in capital to accepted companies. Read more »
The mobile landscape isn’t comprised of just the iPhone 5s, Samsung Galaxy 4 and Moto X. Thousands of different devices access an average mobile site, and they all have different processors, batteries, sensors and networks. That means an opportunity for personalization like never before. Read more »
The latest traffic report from publishing analytics startup Parse.ly shows Google still dominating in terms of referring traffic to publishers’ sites — but that referral data now comes without associated search terms 87 percent of the time. Read more »
How important is the industrial internet to the world’s economy? According to GE’s Bill Ruh, even small improvements in some areas could save billions of dollars per year and significantly reduce problems like flight delays. Read more »
If you’ve ever wanted to use the Couchbase NoSQL database but didn’t feel like managing servers, a San Mateo, Calif.-based startup called KuroBase says it has you covered with its new service. Cloud databases are already pretty popular with web developers running MongoDB, Postgres and even CouchDB (kind of, technically), but I believe this is a first for Couchbase. It could be popular, though, especially if developers are keen on Couchbase’s new ability to sync data between mobile devices and a central database.
We have been hearing about things like YARN and high availability for a few years — they’ve even been incorporated into some commercial Hadoop distributions — but now they’re finally part of the official Apache Hadoop code base. Technically version 2.2.0, “The project’s latest release marks a major milestone more than four years in the making, and has achieved the level of stability and enterprise-readiness to earn the General Availability designation,” according to an Apache Software Foundation press release.
I think this is more about Hadoop and other emerging technologies than the analysts quoted here are willing to admit. Why do you think Teradata is pushing its Hadoop story so much lately? There is, for example, crazy excitement around big data and Hadoop in China. Customers with blank slates center their efforts around Hadoop, while big existing customers are trying to offload more to Hadoop. Teradata sales are fairly flat right now even in the U.S. because big existing customers are getting bigger but fewer are signing up.
IBM has shared some details about a new project called WatsonPaths that lets doctors actually interact with the system to understand how it came to its conclusions, and to tell it whether its “thinking” was right. This type of interaction is critical in any type of machine learning system where speed isn’t the primary objective, because it lets humans see things they might not have and also train the machines to be more accurate in the future. WatsonPaths is a GUI-based tool and is being developed along with doctors at the Cleveland Clinic.
Hortonworks is working to integrate the Storm stream-processing engine with its Hadoop distro, and hopes to have it ready for enterprise apps within a year’s time. It’s the latest non-batch functionality for Hadoop thanks to YARN, which lets Hadoop run all sorts of processing frameworks. Read more »
A startup called Aviate has built an app that its creators say makes Android better by, essentially, taking it over. Aviate gives users a pared-down home screen, categorizes their apps, and automatically surfaces the ones they need depending on where they are. Read more »
Premise Data wants to change the way decision makers think about macroeconomic indicators by changing the way they consume that data. Thanks to a glut of e-commerce data and a network of smartphone-equipped price watchers worldwide, it’s making information a real-time affair. Read more »