More hadoop Stories

Greenplum-HDEnterprise-72dpi IMAGE2

Storage giant EMC is adding more muscle to its Hadoop strategy with a 1,000-node cluster for testing new Apache Hadoop releases and a new analytics appliance combining EMC’s Hadoop distribution with the EMC Greenplum Database. Read more »

cassandrathumb

DataStax has created the first commercial distribution of the Apache Cassandra database and has just closed an $11 million Series B round. Neither piece of news should come as a shock because as NoSQL products have been maturing over the past year, money has always followed. Read more »

election map

Building on his social media success in 2008, it looks like big data will be a driving force behind President Obama’s reelection campaign. To that end, his team is taking to the streets to find data scientists and engineers, including at an event Tuesday at Stanford. Read more »

loading external resource

copyright2

Balancing an open-source community with commercial interests can be difficult, which is why HPCC Systems sought the help of Bruce Perens before open-sourcing its eponymous big-data-processing software. Essentially, the company either ensures the existence of a free version or pulls contributed code. Read more »

equations

Predictive analytics provider Opera Solutions has raised $84 million from equity investors in its first-ever funding round, but that amount shouldn’t be surprising for anyone familiar with the company. I’ve called Opera the big data expert you’ve never heard of, but that’s about to change. Read more »

open source

HPCC Systems has released the open source code of its data-processing software that it’s positioning as a better version of Hadoop. The code is available on Github, and it marks the commencement of HPCC Systems’ quest to build a community of developers underneath Hadoop’s expansive shadow. Read more »

for dummies

Hadoop-based startup Platfora has raised $5.7 million from Andreessen Horowitz and military intelligence–focused strategic investor In-Q-Tel. Investors are excited because Platfora promises big things around making big data analytics obtainable by anyone needing to parse large volumes of unstructured data, not just data scientists. Read more »

cashroll

MapR Technologies, the San Jose, Calif.-based startup that sells it own Hadoop distribution for analyzing large volumes of unstructured data, has raised a $20 million Series B round, which will helps its positioning as a worthy alternative in a space that Cloudera has dominated since 2009. Read more »

funnel

Hopper wants to make searching for travel options a more complete experience using big data tools, and it has raised millions to do it. Hopper lets users enter keyword searches, but it provides results far beyond those typically found in a keyword search. Read more »

loading external resource

crowbar

Dell’s Crowbar installation-and-configuration tool now works VMware’s Cloud Foundry. With servers fast becoming low-margin commodities thanks to the push toward micro servers, Dell is doing its best to make deploying the software that inspired the new generation of servers a breeze. Read more »

patilthumb

Greylock Partners, the VC firm backing companies such as Cloudera, Airbnb and ZipCar hired DJ Patil, formerly the chief product officer at Color, as a data scientist in residence. Greylock’s trying to help its companies monetize the rich vein of user data on the web. Read more »

Todd Papaioannou at Structure 2010 cocktail event at GigaOM offices. Photo by Om Malik

Todd Papaioannou, VP and chief cloud architect at Yahoo, has left the company for a role as entrepreneur in residence at Battery Ventures. At Yahoo, he drove the strategic directions for both the cloud computing and Hadoop teams and helped define the company’s overall IT strategy. Read more »

iStock_000000072805XSmall

Attention webscale aficionados, Twitter plans to open source its Hadoop-like real-time data processing tool known as Storm. The social service nabbed the code through its acquisition last month of BackType, and says it’s a better tool for processing streams of data. Read more »

Servers? We don't need no stinkin' servers!

The great things about open source software stacks is that they’re free and they work. The not-so-great thing is that — like many open source projects — they can be difficult to configure and manage. Luckily, hardware vendors are stepping in to fill the void. Read more »

These things are expensive.

The face of high-performance computing is changing. That means new technologies and new names, but also familiar names in new places. Anyone that doesn’t have a cloud computing story to tell, possibly a big data one too, might starting looking really old really quickly. Read more »

security checkpoint

The open-source, data-processing tool Hadoop is already popular for a variety of use cases that can benefit from clusters of machines churning through unstructured data — such as search engines and social-media analysis — and now it’s turning its attention to security data. Read more »

American_Cash

Appistry, a St. Louis–based software company, has closed a $12 million Series D round for its family of distributed computing products. The company also appears to have changed its corporate messaging — from that of a cloud-computing vendor to that of a big-data vendor. Read more »

Subscriber Content

Web companies like Google and Facebook gain business advantage by analyzing large volumes of rapidly changing data about their users, but they are far from alone. A recent infographic from Get Satisfaction charts the volume of data stored in 17 key industry sectors, illustrating that most ... Read more at GigaOM Pro »

Big data has the potential to cut operating costs by nearly 50% across all sectors of manufacturing. Get Satisfaction makes several interesting claims about opportunities for big data in an infographic released this month. Market segments such as manufacturing are generating far more data (966 petabytes […] Read more »

wildebeest migration

For anyone who didn’t know, Facebook is a huge Hadoop user, and it does some very cool things to stretch the open source big data platform to meet Facebook’s unique needs. Today, it detailed how it migrated its 30-petabyte cluster from one data center to another. Read more »

handing over money

Concurrent, the company providing the Cascading data workflow API, has raised a $900,000 seed round to capitalize on the newfound excitement around Hadoop. Cascading is an open-source API for creating and running data workflows atop Hadoop clusters. Read more »

origami elephant

All the speculation about how Yahoo’s Hadoop spinoff company, Hortonworks, will affect Cloudera and other companies providing Hadoop-based products might have been overblown. The company is still figuring out its strategy around offering a Hadoop distribution, which could be good news for competitors such as Cloudera. Read more »

[OpenStack] looks not only like an open-source alternative to Amazon Web Services and VMware vCloud in the public Infrastructure as a Service space, but also a democratizing force in the private-cloud software space. As my colleague Derrick Harris suggests, the open-source cloud-computing project OpenStack has come a […] Read more »

kmeans_scatter_plot

Alpine Data Labs, a predictive analytics startup that incubated within Greenplum (now part of EMC), is expanding its support beyond the Greenplum Database and into Oracle’s Exadata appliance and the open-source Postgres database. Alpine tries to distinguish itself by running entirely within companies’ analytic databases. Read more »

printing press

Something I thought about a lot while writing about OpenStack yesterday is how much it democratized access to cloud computing in just a year. But OpenStack is just one example of how information technology, overall, is undergoing a period of arguably unprecedented democratization. Read more »

origami elephant

The size of Hadoop deployments appears to have tripled since October, according to statistics that Cloudera is sharing. If accurate, they help prove assumptions that Hadoop usage grows quickly once organizations wrap their heads around how it is used. Read more »

twitter_newbird_boxed_whiteonblue

Twitter announced Tuesday it has acquired BackType, an analytics platform aimed at helping companies and brands gauge their social media impact. The possible rationale for the deal is BackType’s Storm real-time big data processing platform that could help Twitter offer well-defined analytics. Read more »

fantasy

Big data — as in managing and analyzing large volumes of information — has come a long way in the past couple of years. Among the greatest innovations might be the advent of real-time analytics, which allow the processing of information in real time to enable instantaneous decision-making. Read more »

hummer

The fight for Hadoop dominance is officially on. While Hortonworks is busy answering questions about its product strategy, Cloudera and MapR will demonstrate new versions of their distributions overflowing with bells and whistles. And there are several other competitive products lurking in the background. Read more »

Tim Moreton CEO of Acunu

Big data can sometimes mean big infrastructure to run everything on, or perhaps it can mean slower performance as the hardware struggles to read from or write to a database, which is why we picked Acunu as one of our Structure 2011 Launchpad companies. Learn more. Read more »

server farm

Hadoop is a very valuable tool, but it’s far from perfect. While Apache, Cloudera, EMC, MapR and Yahoo focus on core architectural issues, there is a group of vendors trying to make Hadoop a more-fulfilling experience by focusing on business-level concerns such as applications and utilization. Read more »

horton_hears_a_who_-copy

Yahoo will be spinning off a separate company focused on the development and commercialization of Apache Hadoop, called HortonWorks. The official announcement likely will come tomorrow or Wednesday to coincide with Yahoo’s annual Hadoop Summit, but rumors have been circulating for months. Read more »

Being able to crunch terabytes of data is great, but having someone else do it for you is even better. HPCC Systems, which launched last week to challenge Hadoop’s big data dominance, is planning to do just that with a cloud service for big data processing. Read more »

Subscriber Content

fieldguide

Cloud computing has grown from a pie-in-the-sky vision to a major IT movement over the past few years. As its promise has grown, though, so too has its scope. This report covers six key sectors in cloud computing: commodity Infrastructure-as-a-Service (IaaS), enterprise IaaS, Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS), cloud storage and private clouds. We highlight the current state of each and provide informed insights into where they — and cloud computing in general — are headed. Much like any market in a still-evolving state, the infrastructure of the cloud-computing transition is still being built by startups, practitioners and even a big-name company or two. Companies mentioned in this report include VMware, Amazon, Nasuni, Terremark and Heroku. For a full list of companies, and to read the full report, sign up for a free trial. Read more at GigaOM Pro »

SeaMicro's SM10000-64 server.

Online dating service eHarmony is using SeaMicro’s specialized Intel Atom-powered servers as the foundation of its Hadoop infrastructure, demonstrating that big data applications such as Hadoop might be a killer app for low-powered micro servers. Read more »

167891011page 8 of 11