Randall Munroe, the man who writes web comic xkcd, also runs a series called What If in which he offers the answer to questions using data gleaned from the web and physics. On Tuesday the he tackled the question “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” The result is a speculative blog post that estimates Google’s server count (between 1.8 million and 2.4 million) total storage (10 exabytes) and tells you how to find the search giant’s secret data center locales (go read it to find out.)
Recommind, a San Francisco-based company that sells machine learning software optimized for e-discovery in the legal industry, has raised $15 million from SAP Ventures. The new money will go toward growing the company’s footprint outside the legal space via enterprise software that lets humans and machines work closely with one another around data analysis — something Recommind CTO Jan Puzicha discussed with me in March at Structure: Data.
DataSift, one of the two companies (along with Gnip) granted real-time access to the Twitter firehose, now offers real-time and historical analysis of Tumblr data. While it’s best-known for Twitter, DataSift actually analyzes dozens of social media and commenting platforms, which is pretty handy if you want to compare sentiment, engagement or whatever else across platforms where people behave quite differently.
An MIT professor has conducted some handy research that could help make applications run faster and use less energy by overcoming an inherent drawback of multicore processors. The problem is that although the local caches on chips save them the latency of having to access RAM, the hardware-wired algorithms powering them often assign data to cache locations randomly without considering the core trying to access it. The new software-based technique, called Jigsaw, tracks which cores are accessing what data — and how much — and assigns data locale accordingly. The paper detailing Jigsaw is available here.
New research out of Carnegie Mellon University shows that analyzing fans’ tweets can help gamblers make better bets on NFL games. Sometimes. Their technique wasn’t very effective at picking winners or betting the over/under, but it was 55 percent accurate on bets against the spread (and then only during the middle of the season). I doubt anyone will undertake this effort themselves for such a slight edge, but there might be a business here if someone can figure out a consistently accurate model.