Rayid Ghani might be best known for leading the Obama for America data science team, but his latest mission is to bring that experience to bear on the nonprofit world through a research director role at the University of Chicago and a startup called Edgeflip. Read more »
Execs are talking about measuring tweet volume and the reach of those tweets, but isn’t the real value in figuring out what people think? It’s not worth touting that 200,000 people tweeted and 4 million people saw those tweets if the overall sentiment is that the show sucks. But given the history of shows such as “Arrested Development,” 20,000 of the right people tweeting about how great something is might be worth noting even if ratings aren’t high.
HP is all about the enterprise cloud and all about OpenStack, although its approach might seem very different for devotees of open source software or Amazon Web Services. Here’s how HP’s Margaret Dawson explains the company’s strategy. Read more »
Cloudera, Hortonworks, MapR and others are battling to lock down market share for commercial Hadoop software, but they’re inherently limited when it comes to innovation. Why not take advantage of the work already done by big Hadoop users like Facebook, Twitter and LinkedIn? Read more »
GE is pushing new technology that uses hundreds of sensors and advanced analytics to make its fleet of gas turbines run more efficiently. GE is a big fan of big data, investing more than $100 million in tech companies this year alone. Read more »
Twitter’s IPO filing is full of nuggets about the company’s revenue and overall business, including our first real look into the company’s data centers. We still don’t know where they are, but we know what they cost. Read more »
A group of researchers from Stanford has been working on deep learning models that can make sense of whole sentences at a time, and has recently trained its models on a large collection of online movie reviews. Read more »
Hadoop startup WibiData has updated Kiji, its open source project that aims to make HBase a better (or easier) database for serving real-time applications. Among the updates in its latest SDK is an improved version of the KijiScoring feature. “Developers can now pass per-request settings to producer functions, greatly expanding the flexibility of real-time predictive model scoring. For example, a user’s current geolocation from mobile application can be factored in when re-computing which offers or recommendations to serve a user,” explains a press release.
IBM is teaming with MIT, Carnegie Mellon University, New York University and the Rensselaer Polytechnic Institute to advance the state of the art in building smarter computer systems. Their research ranges from automatically classifying text and images to human-computer interaction. Read more »
Guavus, a San Mateo, Calif.-based startup that specializes in analyzing the data coming off carrier networks, has hired former NetApp EVP Manish Goel as CEO. Goel replaces Anukool Lakhina, who founded the company and will stay on board to help drive its technology strategy, among other things. Guavus has raised $87 million in capital and claims some major wireless carriers as customers of its software that helps tie customer data to network activity.
Yelp has announced the winners of its inaugural Yelp Dataset Challenge, and the four entries it chose actually seem pretty useful. They run the gamut from a technique to highlight key words so users can read reviews faster to helping businesses predict whether they’ll see an uptick in activity on Yelp. Having read countless reviews giving restaurants low ratings even though the food was good, I think the entry that extracts subtopics (e.g., food, service, ambience) from restaurant reviews might be my favorite.
Paypal is a finalist in the Netflix OSS Cloud Prize contest for a project called Aurora, which is Netflix’s Asgard cloud-management system rebuilt for OpenStack. Netflix is famously a big Amazon cloud user, so seeing its technology retooled for OpenStack is an interesting turn. Read more »
Machine learning startup BigML now supports text data in its cloud-based prediction service. It has always analyzed numerical fields in complex datasets to determine the relationship between them and any given outcome, and how it will consider the importance of words, too. Read more »
A scientist writing for Politico has equated government data mining with atomic bombs and is calling for disarmament. But if citizens are going to have a voice in this debate, we probably need to solve web privacy first. Read more »
IBM is going to acquire a Dublin, Ireland-based company called The Now Factory, which specializes in providing customer and network analytics for wireless carriers. The idea is that better, faster data about their networks can help carriers optimize performance and better serve (or target) customers based on their usage behavior. The Now Factory seems similar in vision to the San Mateo, Calif.-based Guavus, and it seems logical the two will cross paths more often thanks to IBM’s global reach.
Splunk is furthering its evolution beyond IT search with a new set of features that make it easier for business users to create, analyze and visualize machine-generated data sets. With lots of competition popping up everyday, Splunk can’t rest on its laurels. Read more »
Fantasy football is a big business that thrives on data, making it a great way to prove out a new technology and possibly earn a few bucks. A startup called SkyPhrase, for example, is putting its natural-language processing technology to use on NFL statistics. Read more »
Startup Dataguise has closed a $13 million series B investment round “led by Toba Capital with additional capital coming from the investment arm of a leading electronic conglomerate,” according to a press release. Dataguise’s biggest selling point might be its product designed to secure data within Hadoop. Aside from standard authentication, Fremont, Calif.-based Dataguise actually uses big data techniques to analyze data, determine what’s sensitive and then mask or encrypt it.
Cloudera will be integrating with the Apache Accumulo database and, according to a press release, “devoting significant internal engineering resources to speed Accumulo’s development.” The National Security Agency created Accumulo and built in fine-grained authentication to ensure only authorized individuals could see ay given piece of data. Cloudera’s support could be bittersweet for Sqrrl, an Accumulo startup comprised of former NSA engineers and intelligence experts, which should benefit from a bigger ecosystem but whose sales might suffer if Accumulo makes its way into Cloudera’s Hadoop distribution.
EqualLogic and now DataGravity Co-founder Paula Long is very smart about storage technology. Right now, she’s looking at things like flash and cloud storage with a skeptical eye. They’re valuable and will become more valuable, she says, but only when they’re done right. Read more »
Shares of Violin Memory stock closed their first day of public availabiity at $7.02, down 22 percent from the morning’s initial asking price of $9 per share. However, CEO Don Basile is confident the market will come around in time. Read more »
It might have priced in the lower range of its purported value, but enterprise tech stocks have done pretty well recently and Violin has been one the bigger companies in a red-hot flash market. More interesting in the long run might be how Violin’s IPO affects — or is affected — by planned IPOs for smaller flash vendors like Pure Storage and Nimble Storage. Expect an update on the Violin public offering on Friday.
Google has released another paper showing off the power of its deep learning techniques for text analysis. It shows how models can detect similar usage of words across different languages, meaning it can accurately translate words and concepts from one language to another. Read more »
This article from Klint Finley at Wired Enterprise raises some good questions about the ideal integration of big data into nonprofits. I rather prefer the efforts of DataKind and the SumAll Foundation, which try to help nonprofits solve problems rather than harvest email addresses. The flipside, of course, is that individual donors are what keep the lights on in many cases, so access to more of them is good.
Gnip, one of a handful of companies with direct access to the Twitter firehose, is now letting its customers query 30 days worth of tweets via a new search API. CEO Chris Moody describes it as fast delivery on small data. Read more »
A team of professors behind the open source Spark and Shark in-memory big data projects has raised $13.9 million to commercialize the products via a company called Databricks. Spark and Shark are designed to be much faster and more flexible than Hadoop MapReduce and Hive. Read more »
Microsoft is working some impressive new features into Power BI, its Excel add-on for Office 365 that’s focused on making analytics easier. Among the capabilities announced on Wednesday were natural-language search and visualizations, and new and improved maps. Read more »
This seems like good advice from Hortonworks’ Ofer Mendelevitch. Python? Check. Java? Check. Hadoop? Check. SQL? Check. Stats? Check. But his closing remark — “The road to data science is not a walk in the park. … This takes time, effort and a personal investment.” — might be the most important. We often talk about democratizing some of the data science tools, but the really good ones can do it all.
Hadoop startup MapR has released a new version of its commercial HBase database, called M7. According to a press release, “HBase applications can now benefit from MapR’s high performance platform to address one of the major issues for on-line applications, consistent read latencies in the less than 20 millisecond range across varying workloads.” MapR released M7 in May and claims its architectural improvements over open source HBase result in a faster, easier experience.
Publishing analytics startup Parse.ly moved its production application off of Rackspace in 2011 to save costs. Two years later, it has watched Amazon Web Services costs drop precipitously, and now CTO Andrew Montalenti says it’s probably time to head back to the cloud. Read more »
Modular data center manufacturer IO has filed a confidential S-1 form and plans to go public in the near future. The company has made a name for itself selling fully contained data centers that take up only 462 square feet of floor space. Read more »
Is there a line beyond which people are no longer mere Quantified Selfers but something much more annoying? Could data really be used as “success theater” to make someone seem more successful than he really is? Of course. You know who you are …
NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily that stores and indexes wide varieties of customer data using HBase and other open source technologies, and then layers various various analytic functions and applications on top of it. The data layer of Lily is available as an open source download.
When U.S. lawmakers and policy experts get tired of fighting ideological battles over the past, they might want to put a little effort into helping improve the country’s future. Here are four technology issues that could help improve the economy and outline Americans’ digital rights. Read more »
This is an interesting patent application, in part because of its techniques and in part because — like many technology-related patent applications — it’s hard to see how it’s particularly novel. The idea of using someone’s social graph to find influential connections that could inform mobile-app recommendations is pretty good, but at the core aren’t we just talking about the decision to value one variable more than another in a recommendation system?
Hadapt, a startup that has been pushing SQL on Hadoop since 2011, is rolling out a new technology it calls “schema-less SQL.” Essentially, the SQL portion of Hadapt’s platform will automatically form columns from the keys of JSON and other data types, thus making the associated values queryable like values in a standard relational database. This sort of joint SQL-NoSQL support is likely to become a lot more normal for analytic databases. Curt Monash has a good technical breakdown of the new Hadapt feature.
The Comparing Constitutions Project has launched new web tool called Constitute, which lets users search their way through the world’s constitutions by keyword or theme. Not only is the tool handy for gathering info on international laws, but it’s also indicative of how the web can ease access to valuable data via nice interfaces masking lots of complicated data-prep work. The organization’s website has lots of other constitutional data and visualizations, too.
At Structure: Europe 2013, New Relic Founder Lew Cirne, Kleiner Perkins General Partner Michael Abbott (former Twitter engineering VP) and North Bridge General Partner Jonathan Heiliger (former Facebook engineering VP) spoke about the business opportunities around next-gen analytics. Read more »
Structure:Europe was about many things — cloud computing, privacy, how to build a global business — but it might have been most about scale. The goal of any tech company is to handle untold millions of users and their data, and many speakers are doing just that. Read more »
A Denver-based startup called AlchemyAPI is close to rolling out deep-learning-based image recognition via its API service. The company has made something of a name for itself in the text-analysis world, and it says it can do image recognition as well as Google. Read more »