Cloudera has raised a $160 million round of venture capital led by T. Rowe Price. This conflicts with an earlier report that the Hadoop vendor was raising more than $200 million with Intel leading the way. Read more »
A Facebook research paper details a new method for recognizing the people in images by combining deep learning techniques with a method for recomposing angled images as straight-on ones. It’s the latest in a series of advances web companies have made in this field. Read more »
Hadoop pioneer Cloudera is reportedly raising “at least $200 million” from a group of investors that includes Hadoop competitor Intel. If true, it raises some interesting questions about how the two companies might decide to co-exist. Read more »
Databricks, the company behind the commercialization of the Apache Spark data-processing framework, is certifying third-party software to run on the platform. Spark is gaining popularity as a faster, easier alternative to MapReduce in Hadoop environments. Read more »
Machine learning startup Wise.io, whose founders are University of California, Berkeley, astrophysicists, has raised a $2.5 million series A round of of venture capital. The company claims its software, which was built to analyze telescope imagery, can simplify the process of predicting customer behavior. Read more »
Declara Co-founder and CEO knows a lot about overcoming adversity and the process of learning as an adult. She also knows a lot about algorithms. On the Structure Show podcast this week, she explained how the two have intersected with her company’s platform. Read more »
Facebook has open sourced the code it uses to measure the energy and water consumption of its data centers, as well as the code for the dashboard that visualizes those readings in real time. Facebook first publicly shared the dashboard in April 2013 for its Prineville, Ore., and Forest City, N.C., data centers and it has been available online since. Facebook says cloud provider Rackspace helped get the code ready for open source and is considering implementing it for its data centers.
Another study is reporting on the inaccuracy of Google Flu Trends project, which predicts seasonal flu rates based on search data. However, Google’s algorithms don’t constitute the “big data” approach to this issue, they’re just one piece of a smart big data approach. Read more »
Facebook doesn’t have a shortage of IT gear or smart engineers, but it does have limits on power and how many hours those engineers have in a day. On Thursday, the company explained how its Look Back videos pushed both to the limits. Read more »
Premise, the company trying to reinvent macroeconomic indicators in developing countries, has raised an $11 million series B round led by Social+Capital along with Google Ventures, Harrison Metal, Andreessen Horowitz and Bowery Capital. As we explained when the company launched in October, it uses smartphone-armed agents around the world who snap strategic photos that Premise then analyzes to determine the economic health of a region. Co-founder and CEO David Soloff is speaking at our Structure Data conference next week in New York, where the company will also receive a Structure Data Editor’s Choice award.
Forget about how much data a disk can store or whether companies will use Hadoop. The questions for big data going forward are how they’ll use Hadoop, how intelligent our systems can actually become and how we’ll keep them in check. Read more »
For fun, I decided to turn my iTunes library into a network graph and compare the language in Edward Snowden’s recent SXSW interview to Gen. Keith Alexander’s Black Hat talk in July. Just because you’re not a data scientist doesn’t mean you can’t enjoy data. Read more »
DataRPM is one of a handful of companies trying to move business intelligence into its next generation by incorporating natural language processing and a search-like experience. InterWest Partners led its series A round. Read more »
Basho, a NoSQL startup whose Riak database competes against the likes of Cassandra in scale-out environments, has lost its CEO Greg Collins, CTO Justin Sheehy and Chief Architect Andy Gross. In an interview with the Register, Sheehy said the departures aren’t as bad as they look and that the company is in good hands. Perhaps, although whoever replaces Collins will be the company’s fourth CEO since it was founded in 2007, and neither of the company’s co-founders remain. Basho has raised more than $31 million in venture capital, with its last funding round of $11.1 million coming in July 2012.
A Major League Baseball team is reportedly the proud owner of a Cray Urika graph-processing appliance that helps the team make in-game decisions by analyzing lots and lots of data. It might be a first, but it’s where sports are headed. Read more »
Eucalyptus CEO Marten Mickos has been around the private cloud space since its inception. Nearly five years after the company’s launch, Mickos shares his thoughts on a market that rose fast, fell hard and appears to be on the rebound. Read more »
As it has been doing a lot of lately, the Facebook data science team released another study on Friday highlighting a particular facet of the social science treasure trove that is its collection of wall posts. It might be cool to see this kind of data in the hands of non-corporate researchers, but it’s still interesting to see things like how polarized political parties are or how much more positive women seem toward each other than men. Also, “Damn Canadians!”
Even Booz Allen Hamilton has dollar signs in its eyes when it thinks about sports data. The company is getting started on a new venture to apply its data science mastery to the piles of sensor and statistical data teams are generating. Read more »
Facebook is building its second data center in Luleå, Sweden, using “rapid deployment data center” techniques that will speed construction and simplify design by prebuilding certain parts and creating standardized kits for others. Read more »
As a new study about sex trafficking during the Super Bowl highlights, advances in data analysis are underpinning some powerful new ways of tackling very tough problems. Among all the stones hurled at the tech sector lately, this is an area in which it can take pride. Read more »
One blog post says, “Not only is Data Science not a science, it’s not even a good job prospect.” Another says, “[T]here will always be a place for those who excel at solving ambiguous technological & business problems. And they’ll cost more than $30/hr.” Who’s right? Read more »
This year’s Structure Data conference has a few new wrinkles, including a trivia night at a nearby pub and a series of Data Lab talks about using new types of data. Here are the details. Read more »
Streaming music service Spotify has acquired The Echo Nest and its graph of musical data spanning more than 35 million songs and 2 million artists. It’s an easy way for Spotify to match companies like Google and Pandora on the data science front. Read more »
A new study might help confirm that D-Wave Systems’ quantum computer chip might actually be what it claims to be. Conducted at the University of Southern California, where the D-Wave system owned by aerospace contractor Lockheed Martin is based, a team of scientists has concluded that the 128-qubit processor “behaved in a way that agrees with a model called ‘quantum Monte Carlo,’ yet disagreed with two candidate classical models.” In two weeks at our Structure Data conference, D-Wave CEO Vern Brownell will talk about what quantum computers can do and how they’ll be available as cloud services.
Tableau and Splunk, two of the more successful (and ubiquitous) data startups turned public companies over the past several years, have partnered on a new connector that lets Tableau users access Splunk as a data source within the analytics software. However, it’s not just the existence of a connector that’s valuable for users, but what it means — that they can now combine Splunk data with other data within Tableau to visually analyze all of it together. As Tableau grows more popular, partnering with it is becoming a popular move for everyone from large software vendors to small startups such as BigML.
A Huntsville, Ala., company is moving from the machine-to-machine world into cloud platforms and big data. Here’s how it did it and how it thinks its work could actually end up saving lives. Read more »
An e-commerce startup called Reflektion has raised an $8 million series B round of venture capital for its technology that helps retailers personalize the online shopping experience for consumers. Intel Capital led the round, and Nike and several private investors also pitched in. This seems like the latest thing in marketing — not just targeted advertising but entire tailored experiences for individual shoppers. It does add an Amazon-like recommendation experience, although one might fairly question whether most product catalogs are large enough to warrant it.
An Irish startup called Aylien is getting into the natural-language processing space with a set of APIs for text analysis. It’s not the first company to do this, but it might be the most unique. Read more »
Uber has published a blog post explaining the difference that median income makes on the company’s service in Chicago. Beauty might be in the eye of the beholder here, but the study itself reinforces how much today’s data-driven companies know about their businesses. Read more »
Sqrrl co-founder and VP of business development Ely Kahn came on the Structure Show this week to break down the state of cybersecurity and the cutting edge of data analysis within the Department of Defense. Read more »
Cloudera is working on an open source project called Oryx that aims bring machine learning to Hadoop in a way that previous attempts such as Apache Mahout could not. Read more »
Apache Spark, an in-memory data-processing framework, is now a top-level Apache project. That’s an important step for Spark’s stability as it increasingly replaces MapReduce in next-generation big data applications. Read more »
NoSQL startup DataStax announced on Wednesday that it has added an in-memory option to its commercial version of the Cassandra key-value database. Cassandra is seeing an uptick in adoption right now because of its scalability and ability to span data centers, and the ability to serve data from memory instead of disk will make it a lot faster, too. If the approaches of startups like DataStax, MemSQL and others are any indication, it looks like databases of the future will feature broad ranges of capabilities, data formats and storage options.
Machine learning algorithms can do a lot of things if they have enough data — recommend products, identify fraud and even help women get pregnant. Fertility startup Ovuline says its 60 million data points from users have helped 50,000 of them get pregnant. Read more »
If the thought of tea gets you thinking of British women, fine china and doilies, Tealet might take some getting used to. The Las Vegas startup has built its business so far on the backs of Reddit and Bitcoin, and it hopes to take on Starbucks. Read more »
IBM announced its Watson Mobile Developers Challenge on Wednesday. The company is pushing Watson as a cloud service hard because it knows it has its work cut out to win developers away from startups and large companies like Google also pushing AI via API. Read more »
Although techniques such as machine learning are taking off in the e-commerce and retail spaces as a way to display better recommendations or optimize product presentation, the smart money is still on humans getting the final say in what customers see. Read more »
RunKeeper tracked what its users were up to in Sochi during the Olympics and found they ran the equivalent of about 78 marathons. It’s an interesting nugget, but part of a much larger picture about learning how, when and where people exercise. Read more »
Website performance and security startup CloudFlare has acquired an anti-malware startup called StopTheHacker. The deal makes the popular CloudFlare that much more useful and also gives the company a new business to take advantage of the global infrastructure it’s building out. CEO Matthew Prince recently suggested it would get into the anti-malware space because it often has spare computing capacity that could be put to work scanning networks rather than sitting idle. Although it plans to integrate the two services more tightly, CloudFlare says it will continue operating and investing in the StopTheHacker service.
A company called Carrier IQ is trying to help mobile carriers serve their customers better by using machine learning algorithms to diagnose problems with their smartphone, such as poor battery performance or call quality. A smart use of the technology would be for carriers to get proactive in helping customers resolve their problems before they get annoyed enough to call customer service or, in an increasingly non-contractual industry, just go elsewhere without letting a carrier know they’re leaving. The holy grail of big data, after all, is to actually be able to be proactive.