Data

Cloud backup provider Backblaze has moved into a new data center in Sacramento capable of storing 500 petabytes, or half an exabyte, of data. It’s not full yet (the company was storing 75 petabytes as of November), but the pace is picking up and it probably will be sooner than some might expect. The crazy part is that Backblaze isn’t even that big a company or that widely used a service. Facebook alone is building enough capacity to house 3 exabytes of data in each of its 3 cold storage facilities. Sometimes, I can’t help but think that we’re just digitally hoarding.

A pair of MIT graduate students is working on an interesting system they think can help speed the process of analyzing data without putting it on expensive DRAM. The project uses a cluster of flash drives to store the data, with each one connected to a field-programmable gate array, or FPGA. The FPGA is really the key because it can perform calculations on the data in place before it’s sent over the network to the main processor. The architecture could potentially underpin a functional interactive database system for budget-conscious, data-heavy fields such as science.

Altiscale, the Hadoop-as-a-service startup co-founded by former Yahoo CTO Raymie Stata that launched in June, is now offering its Data Cloud platform to the public. It’s a cloud service in the same vein as Amazon Elastic MapReduce, although it’s probably more similar to fellow startup Qubole. Altiscale is custom-built to run Hadoop workloads (or Spark, or most anything that can run easily on YARN), is fully managed and automatically scales resources to meet the demands of a job. “There hasn’t been a customer yet that we haven’t been able to improve reliability for,” Stata told me recently, primarily by improving efficiency and eliminating failures.

The U.S. uses its digital surveillance capabilities to commit industrial espionage, Edward Snowden has claimed in an interview with German network NDR, broadcast on Sunday night. The NSA whistleblower suggested German industrial giant Siemens was a target, with information being taken by the intelligence agency even when it had nothing to do with national security. When the agency was previously shown to have spied on Brazil’s Petrobras, U.S. Director of National Intelligence James Clapper insisted it never used that information to give U.S. firms an unfair advantage. Australia’s intelligence agency, an NSA partner, has reportedly spied on Japanese firms for the benefit of Australian companies, and France is generally seen as a world leader in that regard.

The Princeton research that used a disease model to suggest Facebook would lose 80 percent of its users in 3 years deserved the hammering it got – it’s simply a bad analogy for the subject. But now a bunch of data scientists from Facebook itself have stepped up to the plate, dryly using the researchers’ own methodology to prove that Princeton enrollment will have depleted entirely by 2021, and the air around us by 2060. Luckily I’m done with my studies, but I’m pretty annoyed about the breathing thing. Damn you, badly-chosen search data and your extrapolations!

The National Football League and General Electric announced on Thursday a list of 16 projects that will each receive $300,000 to advance their research in the field of diagnosing and preventing head injuries. Among the selected projects is a collaboration between the University of California, San Francisco, and machine learning startup Ayasdi to analyze CAT scan data to predict which players might have persistent symptoms. Another involves the Purdue Neurotrama Group and a company called BrainScope that uses machine learning algorithms to power a device that it hopes can detect head injuries on the sidelines. As everything from algorithms to computing power improve, machine learning is actually becoming fairly common in medical research.

1222324252686page 24 of 86