Data

Hadoop startup MapR has released a new version of its commercial HBase database, called M7. According to a press release, “HBase applications can now benefit from MapR’s high performance platform to address one of the major issues for on-line applications, consistent read latencies in the less than 20 millisecond range across varying workloads.” MapR released M7 in May and claims its architectural improvements over open source HBase result in a faster, easier experience.

NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily that stores and indexes wide varieties of customer data using HBase and other open source technologies, and then layers various various analytic functions and applications on top of it. The data layer of Lily is available as an open source download.

Hadapt, a startup that has been pushing SQL on Hadoop since 2011, is rolling out a new technology it calls “schema-less SQL.” Essentially, the SQL portion of Hadapt’s platform will automatically form columns from the keys of JSON and other data types, thus making the associated values queryable like values in a standard relational database. This sort of joint SQL-NoSQL support is likely to become a lot more normal for analytic databases. Curt Monash has a good technical breakdown of the new Hadapt feature.

The Comparing Constitutions Project has launched new web tool called Constitute, which lets users search their way through the world’s constitutions by keyword or theme. Not only is the tool handy for gathering info on international laws, but it’s also indicative of how the web can ease access to valuable data via nice interfaces masking lots of complicated data-prep work. The organization’s website has lots of other constitutional data and visualizations, too.

Randall Munroe, the man who writes web comic xkcd, also runs a series called What If in which he offers the answer to questions using data gleaned from the web and physics. On Tuesday the he tackled the question “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” The result is a speculative blog post that estimates Google’s server count (between 1.8 million and 2.4 million) total storage (10 exabytes) and tells you how to find the search giant’s secret data center locales (go read it to find out.)

Recommind, a San Francisco-based company that sells machine learning software optimized for e-discovery in the legal industry, has raised $15 million from SAP Ventures. The new money will go toward growing the company’s footprint outside the legal space via enterprise software that lets humans and machines work closely with one another around data analysis — something Recommind CTO Jan Puzicha discussed with me in March at Structure: Data.

DataSift, one of the two companies (along with Gnip) granted real-time access to the Twitter firehose, now offers real-time and historical analysis of Tumblr data. While it’s best-known for Twitter, DataSift actually analyzes dozens of social media and commenting platforms, which is pretty handy if you want to compare sentiment, engagement or whatever else across platforms where people behave quite differently.

An MIT professor has conducted some handy research that could help make applications run faster and use less energy by overcoming an inherent drawback of multicore processors. The problem is that although the local caches on chips save them the latency of having to access RAM, the hardware-wired algorithms powering them often assign data to cache locations randomly without considering the core trying to access it. The new software-based technique, called Jigsaw, tracks which cores are accessing what data — and how much — and assigns data locale accordingly. The paper detailing Jigsaw is available here.

1343536373886page 36 of 86