It’s product-announcement season in the big data and Hadoop world, and this week was full of them. Here are some of the more interesting items you might have missed if you blinked. Read more »
Airbnb open sourced a new tool called SmartStack that automates the management of the company’s various services. If they’re healthy, SmartStack facilitates communications. If they’re not, it knows and it takes them down. Read more »
Facebook has one of the largest, if not the largest, MySQL installations in the world, and has created a tool to keep that system online with as little human intervention as possible. It’s called MySQL Pool Scanner and, Facebook’s Shlomo Priymak wrote in a post on Monday describing it, it’s designed to automate ”nearly everything a conventional MySQL Database Administrator (DBA) might do so that the cluster can almost run itself.” Not only does MPS handle availability but, Priymak noted, it also lets administrators do things such as copy the entire Facebook dataset with a single command.
A Boston-area startup called Nutonian is taking its machine learning software originally developed for scientific research into the mainstream. To the degree software can ever tell users how and why data are connected, Nutonian says its product is the best thing around. Read more »
Streamy launched in 2007 and officially closed in 2010, but the social news reader application is back in a stripped-down version — not to make its founders boatloads of money on its own, but to prove the capabilities of their new Hadoop application platform. Read more »
Sales intelligence startup Infer is racking up customers and transactions for its software that analyzes thousands of data points in order to score potential leads on the likelihood they’ll convert. For some customers, the scores have become instrumental metrics. Read more »
Cooladata has raised a $7.4 million series A round from Greylock IL and Carmel Ventures. The company has developed an interesting behavioral analytics service that’s powered by Google’s various cloud services. Read more »
Google’s new real-time map showing DDoS attacks across the world is both awesome and scary — especially if you’re running a website in the United States. Read more »
Teradata is expanding out of the appliance world by offering a fully managed version of its data warehouse software as a cloud service. The company is also dipping a toe into the NoSQL world — the internet of things — with support for JSON files. Read more »
Greylock Partners’ Jerry Chen has seen enough in his recent 9 years building the cloud computing business at VMware to know how to spot an opportunity. Here’s where he thinks entrepreneurs and investors should focus their attention. Read more »
Google spent nearly $2.3 billion on infrastructure in the third quarter — nearly 50 percent more than last quarter and nearly three times the third quarter of 2012. Read more »
A new data-analysis startup called Mode has launched, thanks to a $550,000 angel round led by Yammer Founder David Sacks. Mode targets data analysts with a GitHub-like service for sharing their scripts, data and other work. Read more »
A new firm called Data Elite is trying to help accomplished engineers and scientists launch their startups. It features some prominent investors and advisers, and promises at least $150,000 in capital to accepted companies. Read more »
The mobile landscape isn’t comprised of just the iPhone 5s, Samsung Galaxy 4 and Moto X. Thousands of different devices access an average mobile site, and they all have different processors, batteries, sensors and networks. That means an opportunity for personalization like never before. Read more »
The latest traffic report from publishing analytics startup Parse.ly shows Google still dominating in terms of referring traffic to publishers’ sites — but that referral data now comes without associated search terms 87 percent of the time. Read more »
How important is the industrial internet to the world’s economy? According to GE’s Bill Ruh, even small improvements in some areas could save billions of dollars per year and significantly reduce problems like flight delays. Read more »
If you’ve ever wanted to use the Couchbase NoSQL database but didn’t feel like managing servers, a San Mateo, Calif.-based startup called KuroBase says it has you covered with its new service. Cloud databases are already pretty popular with web developers running MongoDB, Postgres and even CouchDB (kind of, technically), but I believe this is a first for Couchbase. It could be popular, though, especially if developers are keen on Couchbase’s new ability to sync data between mobile devices and a central database.
We have been hearing about things like YARN and high availability for a few years — they’ve even been incorporated into some commercial Hadoop distributions — but now they’re finally part of the official Apache Hadoop code base. Technically version 2.2.0, “The project’s latest release marks a major milestone more than four years in the making, and has achieved the level of stability and enterprise-readiness to earn the General Availability designation,” according to an Apache Software Foundation press release.
I think this is more about Hadoop and other emerging technologies than the analysts quoted here are willing to admit. Why do you think Teradata is pushing its Hadoop story so much lately? There is, for example, crazy excitement around big data and Hadoop in China. Customers with blank slates center their efforts around Hadoop, while big existing customers are trying to offload more to Hadoop. Teradata sales are fairly flat right now even in the U.S. because big existing customers are getting bigger but fewer are signing up.
IBM has shared some details about a new project called WatsonPaths that lets doctors actually interact with the system to understand how it came to its conclusions, and to tell it whether its “thinking” was right. This type of interaction is critical in any type of machine learning system where speed isn’t the primary objective, because it lets humans see things they might not have and also train the machines to be more accurate in the future. WatsonPaths is a GUI-based tool and is being developed along with doctors at the Cleveland Clinic.
Hortonworks is working to integrate the Storm stream-processing engine with its Hadoop distro, and hopes to have it ready for enterprise apps within a year’s time. It’s the latest non-batch functionality for Hadoop thanks to YARN, which lets Hadoop run all sorts of processing frameworks. Read more »
A startup called Aviate has built an app that its creators say makes Android better by, essentially, taking it over. Aviate gives users a pared-down home screen, categorizes their apps, and automatically surfaces the ones they need depending on where they are. Read more »
Premise Data wants to change the way decision makers think about macroeconomic indicators by changing the way they consume that data. Thanks to a glut of e-commerce data and a network of smartphone-equipped price watchers worldwide, it’s making information a real-time affair. Read more »
According to a complaint filed by Zettaset, Intel’s Hadoop management software is so similar to Zettaset’s flagship Orchestrator product that “trying to run Zettaset on top of Intel is akin to trying to put a key into a lock already occupied by a key.” Read more »
Law professor and blogger Eric Goldman drops some knowledge on the ineffectiveness and, one could argue, innovation-hindering effects on these types or privacy laws. I think regulation is a good idea, but it must be flexible and it should be paired with better public education so consumers can make informed choices. I’d rather websites spend money protecting my data or asking me at the time of collection whether they can use data for ads.
Zettaset has sued Intel over its Intel Manager for Apache Hadoop product, claiming it misappropriates Zettaset’s trade secrets. The complaint was filed last week in California. Read more »
Microsoft says it has seen the light in terms of designing software that business users actually want to use. For its new Q&A feature for visualizing Excel data, for example, the product team spent six weeks thinking about UI before even thinking about technology. Read more »
Booz Allen Hamilton serves clients in areas such as national security, financial services and life sciences, and data science is an increasingly important part of the job. A VP in its Strategic Innovation Group talks about how he approaches hiring data scientists and analyzing clients’ data. Read more »
Google, along with its peers at NASA and D-Wave, has released a short video explaining its new quantum computer and the potential — albeit yet unimagined — things it will be able to do. Read more »
Car service startup Uber has produced some graphics showing the effect of the government shutdown on its ridership in the Washington, D.C., area. Certain routes, like between downtown and Capitol Hill, have shown a reduction that the company calls “significant.” Read more »
Cloud developers and engineers have probably heard about Netflix’sChaos Monkey before, and now the company has turned the tool to its production Cassandra database clusters, as this post explains. Chaos Monkey isn’t just about spotting weaknesses in cloud architectures — the real goal is figuring out fixes. Netflix has improved its Cassandra clusters with real-time monitoring and automatic replacement of failed nodes.
Two new research partnerships whose participants range from pharmaceutical companies to IT vendors are taking aim at improving disease treatment via data analysis. They’re targeting a handful of diseases specifically — heart disease and cancer among them — but they point toward a data-driven health care future. Read more »
A new research project from Carnegie Mellon University, funded by a $2.6 million grant from the National Science Foundation, aims to make microchips smarter and more efficient by analyzing the data they collect about themselves. The Statistical Learning in Chip project is focused on developing an integrated machine learning engine that can help chips dynamically manage their resource consumption and keep it at optimum levels. This would make the chips, and the devices running on them, more energy-efficient, resulting in longer battery life and cooler operating temperatures.
IBM has opened a new research lab in San Jose, Calif., called the Accelerated Discovery Lab. Its purpose is to bring together subject matter experts in key areas — the company cites drug discovery, social analytics and predictive maintenance (aka the industrial internet) — with the data and tools they need to make new discoveries in their fields. For IBM, which has billions in revenue riding on these industries, the more it can prove its worth to them, the better.
Dropcam has released a new monitoring camera called the Dropcam Pro that’s remarkably high-resolution, but also very smart. A new user experience enables advanced zooming from a smartphone, and cloud-based machine learning algorithms are letting users filter their video feeds. Read more »
IBM has been awarded a patent for moving virtual machines across physical servers in a cloud in order to ensure applications are receiving the bandwidth they need. It’s an interesting solution to a problem that has plagued some cloud users. Read more »
San Juan Capistrano, Calif.-based startup Cirro is betting that there’s real value in piles of data scattered across corporate data stores, and it has closed an $8 million series A round from Toba Capital, Frost Venture Partners and Miramar Venture Partners to help test its hypothesis. Its platform invokes a SQL-based analytic engine that hits all of a companies various data stores — including big data stores such as Hadoop and NoSQL databases — while carrying out queries.
TransLattice, a Santa Clara, Calif.-based startup selling a geographically distributed relational database system, has acquired Red Bank, N.J.-based cloud-database startup StormDB. Both companies are pushing production-grade, distributed OLTP systems and the Postgres-based StormDB has some of its own IP around MPP analytics and geospatial data. It seems this means StormDB will stop taking new customers but, according to an FAQ on its site, “TransLattice will honor commitments to current StormDB customers.”
Time-series data is proliferating like mad in the era of the internet of things and the industrial internet, and Chicago-based startup TempoDB wants to capture it all. The company has $3.2 million to help it try to pull this off. Read more »
Remember when “polyglot PaaS” was the new thing? Five years after launching, App Engine now supports PHP, Python, Java and Google’s own Go programming language. Kidding aside, App Engine actually has matured quite a bit, has attracted some relatively big users and is part of an ever-impressive cloud platform at Google.