More big-data Stories

Alfred Spector, Google, at Structure Big Data 2011

Google may have more distributed data than any other company but it still takes user input to create smarter machines. Google’s Voice Search speech recognition, for example, began to improve when the service started to train itself and improve accuracy through the use of end-user data Read more »

Cloudera's Amr Awadallah, Pervasive Software's Mike Hoskins, 10gen's Dwight Merriman, Yahoo's Todd Papaioannou, and DataStax Ben Werther

During an afternoon panel entitled “The Many Faces of MapReduce — Hadoop and Beyond,” moderator Gary Orenstein compared the two primary Hadoop components — MapReduce and the Hadoop Distributed File System — to the meat and bread of a sandwich. Read more »

Kevin Krim, Bloomberg, at Structure Big Data 2011

Mining terabytes of data isn’t just for service providers — media companies are also trying to make use of the oceans of information they have about their users to come up with better ways of recommending news to them, says Bloomberg Digital head Kevin Krim. Read more »

loading external resource

cassandrathumb

NoSQL startup DataStax officially entered the pantheon of Hadoop providers today, introducing its own distribution called “Brisk.” Brisk utilizes the open source NoSQL database Cassandra as a replacement for Apache’s Hadoop Distributed File System, as well as Cassandra’s built-in MapReduce engine and Hive. Read more »

Jim Baum, IBM Netezza, at Structure Big Data 2011

Data isn’t the solution to business problems. Pulling data into applications and using it to make decisions and improve the user experience is the way to solve business problems said Jim Baum, the CEO of Netezza, at Structure Big Data. Read more »

Jeff Jonas, IBM at Structure Big Data 2011

As the amount of captured data grows, how can businesses make more sense of it, use it for accurate predictions and better understand their customers? The answer may lie in the world of physics: the concept of space-time paired with data improves predictions through context. Read more »

Terry Jones of Fluidinfo, Hilary Mason of bit.ly, Bill McColl of Cloudscale, and Bassel Ojjeh nPario at Structure Big Data 2011

Joyent Founder and Chief Scientist Jason Hoffman redefined the concept of big data in a panel on data science with bit.ly Chief Scientist Hilary Mason, Cloudscale Founder and CEO Bill McColl, Fluidinfo Founder and CEO Terry Jones, and nPario President and CEO Bassel Ojjeh. Read more »

hadoop logo

A Yale computer science project has turned into a company giving Hadoop the ability to perform analytics on both structured and unstructured data. Hadapt launched today with an undisclosed amount of funding and the goal of making Hadoop more broadly applicable for analytics. Read more »

loading external resource
Subscriber Content

datacenter

Business and IT leaders now face significant opportunities and challenges with big data — that is data sets that are so large they are difficult to store, manage and analyze. This report explores the rapidly evolving big data business and technology ecosystem. It examines big data in the context of several different industries: financial services, health care, sports, travel and media. We explore the different big data technologies — from Hadoop and NoSQL derivatives to cloud-based collaboration tools — and their various benefits for enterprises. And we examine some of the existing challenges big data poses, and what enterprise IT leaders can do to overcome them. Companies mentioned in this report include Amazon Web Services, Google, Teradata, IBM and Cloudera. For a full list of companies, and to read the full report, sign up for a free trial. Read more at GigaOM Pro »

house-of-cards

Netflix is taking a bold step, licensing the first exclusive show to stream through its service before appearing on broadcast or cable TV. But is the move as risky as some might think? Thanks to a large amount of viewing data, Netflix doesn’t think so. Read more »

iStock_000006412772XSmall

I met with a cool startup called DueDil, which is trying to provide a Lexis-Nexis-meets-Google service that aggregates public data on public and private companies from a variety of databases and uses that to create new financial metrics to determine success. Read more »

cat-video

Using Hadoop to process data for targeted web advertising efforts is nothing new, but this week, two companies in the video advertising space also stepped forward to highlight how Hadoop is helping them deliver the right ads to the right viewers for their clients. Read more »

boxing

Consumer electronics recommendation engine Retrevo launched a new feature this morning that challenges the Amazon Marketplace. However, for Retrevo to meet its lofty goals of dethroning Amazon even in this single category, it will have to rely on the accuracy of its machine-learning algorithms. Read more »

hadoop logo

Just over than a month after discontinuing its Hadoop distribution to focus on the flagship Apache Hadoop project, Yahoo is proposing some changes to the Hadoop MapReduce component that could significantly improve processing performance. The proposal illustrates just how beneficial Yahoo’s renewed focus could be. Read more »

brightclouds

Yesterday, HP CEO Leo Apotheker laid out his vision for the company’s cloud computing future, but given HP’s all-but-non-existent cloud strategy until this point, it’s difficult to believe the company can be a real competitor until it actually starts to deliver what Apotheker is promising. Read more »

scientist

IBM and Revolution Analytics have brought together SQL queries and predictive analytics by integrating R Enterprise statistical analysis software with IBM’s Netezza TwinFin data warehouse appliance. It’s part of a significant evolution in analytics strategies as big data becomes a big issue for all types organizations. Read more »

dryad

Microsoft is developing a new big data tool called Dryad. Dryad and the associated programming model, DryadLINQ, simplify the process of running data-intensive applications across hundreds, or even thousands, of machines running Windows HPC Server. Dryad builds upon lessons learned from Hadoop, but differs in some significant ways. Read more »

datacenter

We’re in the midst of a computing implosion: a re-centralization of resources driven by virtualization, many-core CPUs, GPU computing, flash memory, and high-speed networking. We have a lot to watch over the next few years: what I like to call the coming of the Super Server. Read more »

teradata tdc-l

Data warehousing giant Teradata today agreed to acquire Aster Data, a data analytics provider, proving that it’s no longer enough to be able to store and access a lot of data quickly, one must also be able to analyze it quickly. But now, who’s left. Read more »

kromerthumb

Infochimps is attempting to build a data market, and in doing so, the company is wading into some of the messiest and most unstructured data around, attempting to clean it up and put it up for sale. I talk to co-founder Flip Kromer about the challenges. Read more »

YinYang-asym

After all the talk over the past few weeks about IBM’s Watson, it’s becoming clear that Watson is not HAL of big-screen notoriety. But when used in concert with predictive analytics software, technologies like Watson can become part of a very complete big-data architecture. Read more »

database

Big Data software company Acunu said it has closed £2.2 million ($3.6 million) in Series A financing. The startups software helps bridge the gap between expensive in-memory storage and cheap-but-slow hard drives, by offering a rewrite of the storage stack optimized for solid state drives. Read more »

img00059 (1)

Every attendee of SXSW Interactive is used to the yoga, the HTML5, the gaming, and the death of journalism panels, but for 2011, the conference has fastened onto two new trends: data as a double-edged sword and a lack of women in technology and startups. Read more »

IBM Watson

Aaccording to one machine-learning expert, one key takeaway from Watson’s “Jeopardy!” victory is simple: humans are very smart. That a system such as Watson can understand natural language is a huge step forward, but it’s still only as good as its data and algorithms. Read more »

speed

Terracotta is trying to bring real-time analytics to the masses (of Java users, at least) by letting Ehcache users query data stored in the product’s in-memory cache. With Ehcache Search, customers can perform real-time queries against terabytes of data stored in their transactional caches. Read more »

Clouds-A3

In a new Forrester report, authors James Staten and Lauren E. Nelson advise infrastructure and operations (I&O) leaders to encourage their data analysts to get hip to cloud-based analytics tools and to consider making their organizational data available to the public as a cloud resource. Read more »

vertica-analytics-platform-logo

HP announced this morning that it has signed an agreement to acquire analytical database provider Vertica for an undisclosed amount, a decision that finally puts HP into the data warehouse market and analytics space that is becoming more important by the day. Read more »

iphone-heart

The interesting story behind OkCupid, the online dating site recently acquired by Match.com, is OkTrends, its blog that analyzes the site’s wealth of data to shed light on our love lives. But the interesting story behind OkTrends is its use of R to power those analytics. Read more »

my6sensetree

Taming Twitter’s stream of endless data can be daunting, especially the more people you follow. But start-up My6sense is bringing some order to the chaos with a new Chrome browser extension that prioritizes a user’s Twitter stream, making it relevant to their tastes and interests. Read more »

billguard

New start-up BillGuard is looking to build a crowd-sourced anti-virus billing protection system that digests a consumer’s transactional history and pulls in alerts from banks, existing members and the web. The system uses big data analysis and machine learning to help users spot fraud and errors. Read more »

database

Like most social games, Tribal Crossing applications have a very high database write rate –- changes to the game state must be stored so the user doesn’t lose her game score, “loot” or location. Tribal Crossing migrated from MySQL to Membase to support a higher write rate. Read more »

barrier

It was a big week for big data, with two key trends adding fuel to claims that data management and analysis will never be the same. Even laggards will be tempted to give big data tools a try to see what all the hype is about. Read more »

amazon web services AWS

Netflix offers rent-by-mail and streaming movies. The shift from mail-order to streaming video had fairly significant implications for Netflix’s application infrastructure. Netflix realized it would need multiple geographically dispersed data centers and far more processing capacity so it turned to Amazon’s Web Services. Read more »

Call it a personal cloud or a digital locker. People like keeping stuff online.

With companies looking to leverage your data trail, why not take possession of that information and find innovative uses for it yourself? That’s the question that Jeremie Miller – the developer known behind the open-source protocol that powers many Instant Messaging programs – is trying to answer. Read more »

091107-N-7478G-227

Few would argue that Hadoop doesn’t have a bright future as a foundational element of big data stacks, but Piccolo, a new project out of New York University, is moving data in-memory in an attempt to improve parallel-processing performance beyond what Hadoop and/or MapReduce can do. Read more »

datacenter

With enterprise data volumes growing, business and IT leaders face significant opportunities and challenges from big data. The space, of course, is not without its obstacles — including plenty of privacy concerns — but in 2011, there are numerous sales-growth opportunities and new business models finally surfacing. Read more »

1181920212223page 20 of 23