More data Stories

Upcoming Events

loading external resource

data exhaustion

Making sense of big data can be hard enough without spending untold hours having to write code or manually clean datasets that simply won’t work with existing BI tools. Trifacta is trying to automate that process with a new software product it announced on Tuesday. Read more »

In Brief

Cloud backup provider Backblaze has moved into a new data center in Sacramento capable of storing 500 petabytes, or half an exabyte, of data. It’s not full yet (the company was storing 75 petabytes as of November), but the pace is picking up and it probably will be sooner than some might expect. The crazy part is that Backblaze isn’t even that big a company or that widely used a service. Facebook alone is building enough capacity to house 3 exabytes of data in each of its 3 cold storage facilities. Sometimes, I can’t help but think that we’re just digitally hoarding.

In Brief

A pair of MIT graduate students is working on an interesting system they think can help speed the process of analyzing data without putting it on expensive DRAM. The project uses a cluster of flash drives to store the data, with each one connected to a field-programmable gate array, or FPGA. The FPGA is really the key because it can perform calculations on the data in place before it’s sent over the network to the main processor. The architecture could potentially underpin a functional interactive database system for budget-conscious, data-heavy fields such as science.

On The Web

This a really thoughtful post from Trifacta Co-founder and CEO Joe Hellerstein (who’s clearly ramping up for the big unveil of Trifacta’s product soon) about the transformation of data science skills. As someone who sometimes tries to do work with data — and  often speaks with people who really do work with data — I couldn’t agree more. There are tools that let “business users” or even journalists do valuable stuff, but they’ll always be many steps behind what folks trained in math and computer science can do. And data transformation sucks for everyone.

In Brief

Altiscale, the Hadoop-as-a-service startup co-founded by former Yahoo CTO Raymie Stata that launched in June, is now offering its Data Cloud platform to the public. It’s a cloud service in the same vein as Amazon Elastic MapReduce, although it’s probably more similar to fellow startup Qubole. Altiscale is custom-built to run Hadoop workloads (or Spark, or most anything that can run easily on YARN), is fully managed and automatically scales resources to meet the demands of a job. “There hasn’t been a customer yet that we haven’t been able to improve reliability for,” Stata told me recently, primarily by improving efficiency and eliminating failures.

15678970page 7 of 70