Hadoop through the years: A GigaOM retrospective
We were there very early on for the birth of Hadoop and its maturation into a vital data analysis tool. Here’s a look back at some of our best stories. Read more »
We were there very early on for the birth of Hadoop and its maturation into a vital data analysis tool. Here’s a look back at some of our best stories. Read more »
In Part III of our look at all things Hadoop, we examine the trends driving Hadoop’s future. At the end of the day, everything is pushing Hadoop toward being just generally faster and easier to consume. Read more »
Facebook has developed a new data cache called McDipper that’s essentially memcached rewritten to run on flash memory instead of DRAM, thus saving money while still delivering higher performance than disk. Read more »
{"source":"http:\/\/gigaom.com\/author\/dharrisstructure\/page\/3\/wijax\/b959f4af7e82222223ac4cb50ea2d81d","varname":"wijax_257ac4f3aeb6969f113d7bc2b02a9567","title_element":"h2","title_class":"widget-title","title_before":"%3Ch2%20class%3D%22widget-title%22%3E","title_after":"%3C%2Fh2%3E"}
eBay has released a trove of information about the efficiency of its data centers, and plans to do so quarterly as part of a mission to continuously track computing resources and tie them to bigger business goals. Read more »
How big an impact has Hadoop had on the technology world? Check out our infographic on the reach of the most important big data tool of our time. Read more »
In the first of our four-part multi-media series on Hadoop, the people who helped build Hadoop talk about its birth, its promise and the challenges in moving it from webscale to just large-scale. Read more »
Five years ago, LinkedIn was a shell of the technology company it is today. Here’s an inside look at where it came from, what it’s become and where it’s going. Read more »
Just when you thought spam was under control, a new breed of spammers is taking up new methods to infiltrate our inboxes, search results and social media feeds. Data science could make them very effective. Read more »
There has been a lot of data news already this week — some big, some interesting, and some both. Here’s a collection of the stuff you shouldn’t, or don’t want to, miss. Read more »
EMC Greenplum rolled out a new Hadoop distribution that fuses the popular big data platform with its flagship MPP database technology. Co-founder Scott Yara thinks the company’s huge investment puts it in the catbird seat among Hadoop vendors. Read more »
{"source":"http:\/\/pro.gigaom.com\/wijax\/a206c64880c8215b985ab24ebe90eafd","varname":"wijax_d269eebc26af5b39ec3c65bb7948e7ce","title_element":"h2","title_class":"widget-title","title_before":"%3Ch2%20class%3D%22widget-title%22%3E","title_after":"%3C%2Fh2%3E"}

New research from the University of Michigan highlights the potential for tackling tough problems like crime by getting creative with data, so we can fight the disease instead of the symptoms. Read more »

More and more companies and open source projects are trying to let users run SQL queries from inside Hadoop itself. Here’s a list of what’s available and, on a high level, how they work. Read more »
Red Hat is the latest company offering an alternative to the Hadoop Distributed File System, only this one is open source and ties into Red Hat’s bigger vision of hybrid cloud computing. Read more »
Backblaze pioneered the concept of open source storage hardware in 2009, and its designs have caught on. Hundreds of institutions — including Netflix and Shutterfly — use the designs, which have just entered their third generation. Read more »

Citus Data has expanded its high-speed, analytic database called CitusDB beyond Postgres and into Hadoop. Up next, MongoDB and just about anything else you can think of. Read more »
Facebook Director of Engineering Lars Rasmussen held an Ask Me Anything session of Reddit on Thursday to talk about Graph Search. Here’s what he had to say about the infrastructure underlying it. Read more »
Google Flu Trends significantly overestimated the number of Americans afflicted with flu-like symptoms during the season’s peak a couple months ago, but assuring accuracy is a big part of the puzzle any time we’re talking about web data. Read more »

HStreaming has raised $1 million and is ready to take its message of real-time processing on Hadoop mainstream. In a world tired of batch processing only, that message should be well received. Read more »
Google has announced the seven winners of its App Engine Research Awards, which gave $60,000 in computing credits to projects ranging from neighborhood-centric machine learning to free computer-vision software. Read more »

Two Indiana University researchers have developed a computer model they say can identify significantly better and less-expensive treatments than can doctors acting alone. It’s just the latest evidence that big data will have a profound impact on our health care system. Read more »

F5 has bought LineRate Systems for an undislosed amount of money. The rationale seems pretty clear: F5 is a legacy hardware vendor trying to ride the wave of software disruption by purchasing one of the startups leading it. Read more »
Box VP of Engineering Sam Schillace talks about building the service that became Google Docs, then fighting for the service’s life, and now rethinking collaboration for the mobile web. Read more »
IBM is turning Watson loose on lung cancer, offering up a cloud-based service designed to let doctors from around the country find the best-possible treatments for their patients. Read more »
First, it was semantic search and knowledge graphs surfacing information related to our keyword searches. But there’s a handful of companies working to make relevant content come to us, whatever we’re doing. Read more »
MasterCard Advisors — the division of the credit card company dedicated to analyzing transaction data and providing data consulting services — partnered with data science firm Mu Sigma and MasterCard acquired an undisclosed equity stake in the company. Read more »

Not all data analysis is created equal, and understanding the difference is critical as our society places a greater value on listening to the data. Using big data to cure disease is one thing, using statistics to ruin my sports-watching is quite another. Read more »
Causata is really good at helping companies identify consumers and, thanks to new machine learning features, helping them predict behavior many steps down the line. Does all this personal data create a privacy concern? Depends who you ask. Read more »

Free from the scrutiny of public markets, Dell should let its freak flag fly and take some real risks to distinguish itself from the server-vendor pack. I think that means doubling down on next-generation software and server design. Read more »
Microsoft vet and data platform VP Ted Kummert is joining Madrona Venture Group as a venture partner. As VCs aim to boost their enterprise investments, they’re snatching up talent from big IT. Read more »

As part of its new big-data-focused XDATA initiative, DARPA has invested $3 million in a startup called Continuum Analytics. The company’s aim is to extend Python’s prowess in scientific computing into the world of big data and analytics. Read more »
The secret to Amiigo’s intelligent fitness tracker is a collection of sensors and a reference database full of information about hundreds of activities. The more data users feed it, the smarter it gets. Read more »

For small business lender Capital Access Network, finding worthy borrowers is about a lot more than their credit scores. It’s taking real-world data into account in order to distinguish financial fools from savvy entrepreneurs. Read more »
Gravity, a startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. Considering the increasing importance of personalization online, this could be a good deal. Read more »
Not everyone is drowning in big data or has the know-how to deal with it if they were. Here are six free web services that help mere mortals analyze and visualize their own data. Read more »
IBM is giving Rensselaer Polytechnic Institute in New York its own Watson system similar to the one that crushed its human competitors on Jeopardy!. The goal is to give Watson new skills and push it into new industries. Read more »

In order to recommend new events for its members, online event-management company Eventbrite must build what it calls “implicit social graphs.” It’s just one of many approaches to figuring out what content users want to see. Read more »
Numenta, the latest startup from Palm creator Jeff Hawkins, aims to help us make sense of fast-flowing machine-to-machine data by recognizing patterns and building models. Its latest customer is smart-grid efficiency expert EnerNOC. Read more »
A group of Stanford researchers recently ran a complex fluid dynamics workload across more than a million cores on the Sequoia supercomputer. It’s an impressive feat and might foretell a future where parallel programming becomes commonplace even on our smartphones. Read more »
Is there there really a model for predicting the success of Kickstarter campaigns? A new interactive model from machine learning service BigML offers a fun way to try, although the dearth of public data from Kickstarter might affect its accuracy. Read more »
GoDaddy has been undergoing a transformation lately as it tries to become more valuable to its customers by providing higher-level services than just web hosting. Its latest product is a CDN that it claims can help significantly decrease pageload times for its small-business customers. Read more »
Follow @derrickharris or @gigaom for more stories like this.
You're subscribed to our newsletter. If you'd like, you can update your settings