More data Stories
On The Web

This article from Klint Finley at Wired Enterprise raises some good questions about the ideal integration of big data into nonprofits. I rather prefer the efforts of DataKind and the SumAll Foundation, which try to help nonprofits solve problems rather than harvest email addresses. The flipside, of course, is that individual donors are what keep the lights on in many cases, so access to more of them is good.

Upcoming Events

On The Web

This seems like good advice from Hortonworks’ Ofer Mendelevitch. Python? Check. Java? Check. Hadoop? Check. SQL? Check. Stats? Check. But his closing remark — “The road to data science is not a walk in the park. … This takes time, effort and a personal investment.” — might be the most important. We often talk about democratizing some of the data science tools, but the really good ones can do it all.

In Brief

Hadoop startup MapR has released a new version of its commercial HBase database, called M7. According to a press release, “HBase applications can now benefit from MapR’s high performance platform to address one of the major issues for on-line applications, consistent read latencies in the less than 20 millisecond range across varying workloads.” MapR released M7 in May and claims its architectural improvements over open source HBase result in a faster, easier experience.

On The Web

Is there a line beyond which people are no longer mere Quantified Selfers but something much more annoying? Could data really be used as “success theater” to make someone seem more successful than he really is? Of course. You know who you are …

In Brief

NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily that stores and indexes wide varieties of customer data using HBase and other open source technologies, and then layers various various analytic functions and applications on top of it. The data layer of Lily is available as an open source download.

loading external resource

votingboard_0
photo: U.S. House of Representatives

When U.S. lawmakers and policy experts get tired of fighting ideological battles over the past, they might want to put a little effort into helping improve the country’s future. Here are four technology issues that could help improve the economy and outline Americans’ digital rights. Read more »

On The Web

This is an interesting patent application, in part because of its techniques and in part because — like many technology-related patent applications — it’s hard to see how it’s particularly novel. The idea of using someone’s social graph to find influential connections that could inform mobile-app recommendations is pretty good, but at the core aren’t we just talking about the decision to value one variable more than another in a recommendation system?

In Brief

Hadapt, a startup that has been pushing SQL on Hadoop since 2011, is rolling out a new technology it calls “schema-less SQL.” Essentially, the SQL portion of Hadapt’s platform will automatically form columns from the keys of JSON and other data types, thus making the associated values queryable like values in a standard relational database. This sort of joint SQL-NoSQL support is likely to become a lot more normal for analytic databases. Curt Monash has a good technical breakdown of the new Hadapt feature.

In Brief

The Comparing Constitutions Project has launched new web tool called Constitute, which lets users search their way through the world’s constitutions by keyword or theme. Not only is the tool handy for gathering info on international laws, but it’s also indicative of how the web can ease access to valuable data via nice interfaces masking lots of complicated data-prep work. The organization’s website has lots of other constitutional data and visualizations, too.

graph

Search is evolving to fit the needs of users who don’t just want a web site, but the actual answer to the question driving the search. To stay on top semantic search technologies are key. Read more »

Users have grown accustomed to a real-time web, but now they want an easier-to-implement real-time integration between web services. REST Hooks seems to be the emerging standard for such integration. Read more »

In Brief

Randall Munroe, the man who writes web comic xkcd, also runs a series called What If in which he offers the answer to questions using data gleaned from the web and physics. On Tuesday the he tackled the question “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” The result is a speculative blog post that estimates Google’s server count (between 1.8 million and 2.4 million) total storage (10 exabytes) and tells you how to find the search giant’s secret data center locales (go read it to find out.)

In Brief

Recommind, a San Francisco-based company that sells machine learning software optimized for e-discovery in the legal industry, has raised $15 million from SAP Ventures. The new money will go toward growing the company’s footprint outside the legal space via enterprise software that lets humans and machines work closely with one another around data analysis — something Recommind CTO Jan Puzicha discussed with me in March at Structure: Data.

In Brief

DataSift, one of the two companies (along with Gnip) granted real-time access to the Twitter firehose, now offers real-time and historical analysis of Tumblr data. While it’s best-known for Twitter, DataSift actually analyzes dozens of social media and commenting platforms, which is pretty handy if you want to compare sentiment, engagement or whatever else across platforms where people behave quite differently.

On The Web

The NSA may have found a way to monitor some credit card transactions, according to a Snowden-derived report from Germany’s Der Spiegel. The agency said in leaked documents that it found a way to access Visa transactions in Europe, the Middle East and Africa, but the financial services company denies the tapping of its networks. The report highlights an NSA financial database called Tracfin, into which SWIFT international transfer information also flows through the interception of “SWIFT printer traffic from numerous banks.”

1242526272876page 26 of 76

You're subscribed! If you like, you can update your settings