Big Data is still in its early stages of life; to get to the next stage, its integration with core enterprise technologies…
Movin' on up the data stack
Hadoop vendor Cloudera is moving closer to the business intelligence space by acquiring a startup called Xplain.io. The company’s software analyzes users’ offline…
Hip developers love it
A new survey from startups Databricks and Typesafe revealed some interesting insights into how software developers are using the Apache Spark data-processing…
First: scoring models at scale
The data team at [company]Netflix[/company] is opening sourcing some of the tools it uses to analyze data stored in Hadoop. The overall open…
Don't call it a data warehouse
Treasure Data, a Mountain View, California, startup offering a cloud-based big data platform, has raised a $15 million series B round of…
Structure Data 2015
Big data never really went anywhere, but as a business, it did get a little boring over the past couple years. Big…
Chipping away at AWS
Altiscale, the Hadoop-as-a-Service startup started by ex-Yahoo CTO Raymie Stata, has raised a $30 million series B round of venture capital led…
A Boston-area startup called Cazena launched on Monday with backing from Andreessen Horowitz and North Bridge Venture Partners. The company’s founder and board have strong ties to Netezza, which they think will help them meld big data and cloud computing in a way enterprises will buy.
A Mountain View, California, startup called Waterline Data Science has raised $7 million from Menlo Ventures and Sigma West for its software…
Cloudera has acquired a data-visualization startup called DataPad, the founding team of which specializes in data analysis using the Python programming language. As Hadoop competition heats up, Cloudera might be ramping up its Python tooling in order to attract more data scientists and developers.
According to statistics from job site Indeed.com, postings including the phrase “data scientist” are dropping like a rock, while those including the phrases “data science” and “big data” are on the rise. It might signal a realization by employers that those unicorns don’t exist.
Dell, Cloudera and Intel are working together on an appliance designed to speed the performance of Hadoop environments by moving a lot…
Big data startup Continuuity has teamed with AT&T Labs on an open source project called jetStream that pairs a high-throughput SQL database with a real-time data-processing engine. The goal is to underpin applications that can handle multiple levels of latency, consistency and analysis on streaming data.
Big data startup Concurrent has raised a $10 million series B round of venture capital from Bain Capital Ventures, Rembrandt Ventures and True Ventures…
Investors are betting that companies will take to Trifacta’s interactive, visual management capabilities that allow users to organize and clean data without having to deal with coding.
Raymie Stata, current co-founder and CEO of Hadoop startup Altiscale and ex-Yahoo CTO, came on the Structure Show this week to talk about why Hadoop matters, why who’s building it matters and where it’s headed. Here are the highlights.
Big data experts Andy Palmer and Michael Stonebraker launched a new data automation cleanup tool as part of the duo’s company Tamr. The software tool helps big companies organize sloppy data sets with the aid of machine learning and human guidance.
A startup called Pepperdata launched on Tuesday along with $5 million in series A venture capital from Signia Venture Partners and Webb Investment…
Cloudera CEO Tom Reilly came on the Structure Show this week to talk about why the company entered into a deep partnership with Intel, just how much cash it raised and when it might go public.
Apache Tajo, a relational database warehouse system for Hadoop, has graduated to to-level status within the Apache Software Foundation. It might be easy…
Hadoop pioneer Cloudera has said it closed on a $900 million round of financing that gives Intel an 18 percent stake in the company. Rumors had Intel’s investment at around $100 million, but it’s likely much more.
The Apache Mahout project will now support Apache Spark and another data engine called H20 as it tries to retain its status as the go-to set of machine learning libraries for Hadoop.
Even the biggest media companies are being buffeted by the task of collecting and using mass amounts of consumer and audience information — even as the media business itself continues to shift.
Cloudera is working on an open source project called Oryx that aims bring machine learning to Hadoop in a way that previous attempts such as Apache Mahout could not.
As Hadoop moves from the early adopter phase into the mainstream, IT organizations across all industries are asking how to make the…
On this week’s Structure Show, Mike Olson talks about why Cloudera keeps a sharp eye on new consumer internet technology and we wonder about Rackspace’s CEO planning now that Microsoft has finally settled down
Making sense of big data can be hard enough without spending untold hours having to write code or manually clean datasets that simply won’t work with existing BI tools. Trifacta is trying to automate that process with a new software product it announced on Tuesday.
Altiscale, the Hadoop-as-a-service startup co-founded by former Yahoo CTO Raymie Stata that launched in June, is now offering its Data Cloud platform…
Since Hadoop hit the scene almost a decade ago, IT shops have been quietly funneling enormous amounts of data into it for…
Data-munging specialist Trifacta has raised another $12 million for its mission to speed the process of going from raw data to usable data. As data volumes and types keep piling up, faster tools will mean a lot less wasted time.
McLaren CIO Stuart Birrell says the company’s technology ends up helping build a better shoe, monitoring sick kids, and mitigating athletic injury.
Hadoop startup Datameer is selling a $49 “charity edition” of its spreadsheet-based Hadoop analytics software, with all proceeds this month going to help elephants injured by poaching.
Cloudera is now positioning itself less as a company selling Hadoop and more as a company selling what it calls an “enterprise data hub.” It’s not about the technologies, the company argues, but what they can do for users.
A new startup called Paxata wants to make business analysts’ lives easier by automating the process of going from raw data to something that an analytics product like Tableau can actually understand.
I apologize if I’m late to the game on this, but someone just tweeted me about Apache Tajo, a potentially interesting new…
Streamy launched in 2007 and officially closed in 2010, but the social news reader application is back in a stripped-down version — not to make its founders boatloads of money on its own, but to prove the capabilities of their new Hadoop application platform.
We have been hearing about things like YARN and high availability for a few years — they’ve even been incorporated into some commercial Hadoop…
NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily…
Ron Bodkin of Think Big Analytics discusses the best and worst practices for adopting big data technologies and actually getting results. Companies must beware of dangerous decisions, charlatans and disastrous missteps.
Big data startup HStreaming is now part of Swiss advertising firm Adello Group. HStreaming had standout technology by all accounts, but the business never scaled enough to survive in a tough market.
Hadoop-based analytics startup Tresata last week open sourced a set of machine learning libraries built on Scalding and designed to run in…
https://www.facebook.com/photo.php?v=10151890165548109&set=vb.9445547199&type=2&theater This is a good presentation about Facebook’s graph-processing engine, Giraph, from a big data event held at the company’s Menlo Park…
Cleversafe, a Chicago-based provider of object-storage systems for housing massive amounts of data, has raised a $55 million series D round led…
A recent New York Times article casts some doubt on the economic impact of big data. Here’s why I think we haven’t seen anything yet when it comes to big data and the global economy.
Facebook has detailed its extensive improvements to the open source Apache Giraph graph-processing platform. The project, which is built on top of Hadoop, can now process trillions of connections between people, places and things in minutes.
Microsoft has developed a big data technology that sits on top of Hadoop’s new YARN resource manager. Called REEF, it’s designed to let users build jobs that can maintain state even after they’re done, and that can grab data from wherever they need it.
Hadoop-in-the-cloud startup Qubole says its customers used more than 100,000 nodes to run more than 350,000 jobs and process more than a petabyte of data in July. Those aren’t Facebook numbers, but they seem to signal an appetite among smaller users.
While some big data startups are thriving, others are shutting down or searching for buyers because it doesn’t look like a second round of venture capital is coming. Here are a few lessons I think I’ve gleaned from watching the space over the past few years.
IT services and consulting specialist CSC has acquired Infochimps, a startup that sells a big data query and processing platform. Infochimps had raised about $5 million in equity and debt financing since launching in 2009.
A newcomer called Treasure Data has raised $5 million in its quest to take on the big boys of big data — names like Teradata, Cloudera and Amazon Web Services.
Hadoop vendor Cloudera has acquired its first company, a London-based machine learning startup called Myrrix. Machine learning is becoming a big use case for big data, and Cloudera is wise to get some expertise in-house.
A hot startup called Ayasdi has raised a $30.6 million Series B round from IVP, Citi Ventures and GE Ventures for its technology that takes billions of data points and puts them on a map.
Continuuity founding CEO Todd Papaioannou left the company several weeks ago, putting fellow co-founder and former CTO Jonathan Gray in charge.
Netflix has open sourced its software to make running Hadoop jobs on the Amazon Web Services cloud as easy as possible.
Google and national laboratories want different things out of their infrastructure, although it looks like there’s room for them to learn each other.
Raymie Stata spent seven years working on the guts of Hadoop as a VP, chief architect and CTO at Yahoo. His new Hadoop startup, called Altiscale, has raised a $12 million from some prominent investors.
Cloudera’s new search feature, based on the Apache Solr project, is the latest move by the company to expand the utility of its Hadoop distribution. It’s also far from the last.
Hadoop startup Mortar Data is offering to build recommendation systems for 10 companies, with help from Hilary Mason, Drew Conway and Max Shron. It’s part of a bigger plan to democratize the science behind online recommendations.
Analytics startup Precog is on a mission to make analytics on unstructured data as simple as possible with a new line of targeted appliances.
The advent of big data is affecting Ford Motor Co. in some significant ways, from how it analyzes its supply chain to the features it puts into its cars.
Data scientist might be the sexiest job of the 21st century, but it’s hardly an easy gig to land. Here is some advice from practitioners at Netflix, Orbitz and Hortonworks on how get hired and even do the hiring.
The new version aims to provide a simpler interface for wrangling hundreds of data points per site visit. Qubit has also released research about browser user value, with IE users coming out on top.
IBM announced a new PureData appliance for Hadoop and technology for speeding up analytic databases. The announcements come at a good time, with data sets growing and enterprises hankering for easy and fast analysis capability.
The strategic partnership will see Cloudera’s enterprise Hadoop distribution, along with its Impala real-time query engine, running on top of T-Systems’ extensive cloud infrastructure in Europe and beyond.
Cascading proprietor Concurrent has secured $4 million in venture capital in order to advance its efforts toward easing the development of big data applications.
Red Hat is the latest company offering an alternative to the Hadoop Distributed File System, only this one is open source and ties into Red Hat’s bigger vision of hybrid cloud computing.
Netflix is at it again, this time showing off its homemade architecture for running Hadoop workloads in the Amazon Web Services cloud. It’s all about the flexibility of being able to run, manage and access multiple clusters while eliminating as many barriers as possible.
I recently spent 11 days in Beijing meeting lots of companies trying to make it in cloud computing and big data. Here are seven with which I had a chance to sit down and learn about their businesses and how to sell cloud computing in China.
A lot happened in the world of data analysis this year. Here’s a list of the most-popular and generally most-interesting things I’ve had the fortune to cover in 2012 — from Hadoop to the Supreme Court to Bollywood stars.
Hadoop is nothing without applications, and Continuuity aims to deliver those apps by making Hadoop something developers can work and innovate with. Its efforts haven’t gone unnoticed — the company just closed a $10 million Series A round from a who’s who of big data VCs.
Rackspace is busy building a Hadoop service, giving the company one more avenue to compete with cloud kingpin Amazon Web Services. However, the two services — along with several others on the market — highlight just how different seemingly similar cloud services can be.
Rather than rely on Hadoop or any other popular data-management tools to build a platform for democratizing data science, Precog decided to build its own system from scratch. That makes Precog stand out from the crowd, but it also means there’s little room for error.
Carter S. won his first-ever Kaggle competition — our own GigaOM WordPress Challenge — using a brute force method of data science he calls overkill analytics. Rather than spend untold hours perfecting complex models, Carter used simple algorithms and let powerful microprocessors do the rest.
How much of your data is Facebook collecting every day? Some new stats from the company reveal just how large its user base is, and what big data means to a company with 950 million users.
To say there are a lot of companies involved in the Hadoop ecosystem would be an understatement. To say partnership strategies are broad would be one, too. The folks at Datameer created this infographic to show just how expansive and interconnected the Hadoop ecosystem is.
The results of a recently released survey from Hadoop-focused startup Karmasphere show that while Hadoop use is picking up among mainstream (read “non-web”) companies, it’s still far from the all-powerful and ubiquitous insight engine its supporters (myself included) believe it will become.
Netflix’s algorithms for recommending movies to customers might not be perfect, but it isn’t for lack of trying. The company is capturing and analyzing incredible amounts of data, even from the videos themselves, to try and figure out what you want to watch next.
It’s no secret that Facebook stores a lot of data in Hadoop, but how it keeps that data available whenever it needs it isn’t necessarily common knowledge. Today at the Hadoop Summit Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode.
VMware is launching a new open source project, called “Serengeti,” that aims to let the Hadoop data-processing platform run on the virtualization leader’s vSphere hypervisor. VMware apparently smells a lucrative opportunity in Hadoop and isn’t about to miss out on getting a piece of the pie.
Online genealogy service Ancestry.com is trying to become like the Amazon or Netflix of family trees. Much like those companies use customer data to recommend products or movies customers might like, Ancestry.com is using machine learning to make learning about ancestors a lot less work.
Karmasphere CEO Gail Ennis told me recently she thinks “2013 is going to be the year when we see [Hadoop adoption] go a lot more mainstream and [turn] into a tornado.” I like the prediction, as much for its imagery as for its near-term certainty.
If you just pay attention to largest Hadoop users, you might think the platform is just a way of powering search engines or analyzing customer behavior for ad-serving. Of course that’s not the case, but finding those broader use cases can still be kind of difficult.
Yahoo is looking to leverage its big data prowess with a new tool for marketers called Genome. It looks like an acknowledgement that while Yahoo might not rule the the web anymore, it knows a heck of a lot about analytics.
When your business is to insure farmers against the effects of bad weather, you’d better have some seriously accurate data on your side. Mother Nature, after all, can be somewhat unpredictable. The Climate Corporation thinks the answer is lots of data and lots computing power.
VMware has acquired Cetas, a startup that provides analytics atop the Hadoop platform. Terms of the deal haven’t been disclosed, but Cetas is an 18-month-old company with tens of paying customers that didn’t need to rush into an acquisition. So, why did VMware buy it?
Skybox Imaging, a startup that wants to capture and analyze high-resolution photos and videos of the Earth, has raised $70 million in Series C funding. The money will help Skybox its lineup of software engineers and data scientists that might be its secret sauce.
If you’re an amateur poet and love big data, high-performance system vendor AMAX has a deal for you. The company is conducting a contest to find the best haiku on big data. But I’m sharing my poems right here.
TempoDB, a startup out of Chicago, has build a database as a service offering specifically for time-series data thrown off by thermostats, servers, automotive telematics. Does the world (or the Internet of Things) need a specialty time series database hosted in the cloud?
Managed hosting provider Sungard is getting into the big data space with a new Hadoop service that gives users on-demand access to the popular data-storage and processing platform. Called Unified Analytics Service, Sungard’s new offering joins the growing ranks of cloud-based Hadoop offerings.
Raghu Ramakrishnan, who was the top scientist for several of Yahoo’s key technology efforts, is now a technical fellow with Microsoft’s server and tools unit. This is the latest sign that Yahoo is struggling to retain key technologists.
Seemingly overnight, big data became the behemoth to conquer. But the truth is, tried and true technologies have been tackling the problem for years. Versant’s Robert Greene gives respect to three unsung heroes of big data.
It’s no secret that Yahoo analyzes a lot user data, but today it’s giving the world a striking peek into how all that data is used. A new tool lets visitors work their way through demographic data to see which news stories are the most popular.
If you like the idea of your analytics system’s getting more accurate with each piece of data it ingests, it looks like you are in for an exciting run, because machine learning appears to be catching fire across the ecosystem of big data vendors.
LexisNexis is pressing MarkLogic’s technology into service for its just-launched Lexis Advance legal service. MarkLogic’s document storage, search and analytics technology replaces legacy home-built code as part of a platform modernization and big data push.
HPCC Systems, the division of LexisNexis that’s pushing a big-data processing-and-delivery platform to compete with Hadoop, has tuned its software to run on Amazon’s cloud computing platform. Interested developers can now experiment with the open source software without having to wrangle physical servers.
So the big data backlash begins. The Hadoop framework does a lot, but some experts — including those who push non-Hadoop options — say it’s not enough for many specialized apps where a build-your-own Hadoop implementation costs too much to be a real contender.
It appears as if Apple users’ willingness to shell out a little more cash for a premium experience doesn’t stop at computers. Orbitz’s data-crunching has found that Mac users also spend about $20 more a night on hotels than do Windows users.
Splunk has integrated its product with Apache Hadoop to enable large-scale batch analytics on top of the product’s existing capabilities around real-time search, analysis and visualization of machine-generated data. Users can bring Hadoop data back into Splunk for visualization or run MapReduce jobs from Splunk.
Hadoop isn’t the only thing going in big data, but it’s driving the bus at this point and it seems to have a reverse Midas touch: everything that touches it turns to gold. The latest to experience this is Cloudera, which has raised another $40 million.
Cloudera founder Christophe Bisciglia launched a new company today called Odiago, whose WibiData product utilizes Hadoop and HBase to let businesses make the most of online user data. Big-name investors aside, under the covers WibiData shows the future of how Hadoop-based products will look.
IBM joined the big-data-in-the-cloud fray, announcing Monday that its Hadoop-based InfoSphere BigInsights product will be available as a service on the IBM SmartCloud platform. Big Blue’s timing is good, as Hadoop will likely have a far greater presence across public clouds within the next year.
The federal government has been gung ho over cloud computing in the past few years, but is it ready to do big data in the cloud? Federal contractor GCE Federal is offering a cloud service based on Hadoop and designed for federal agencies to outsource analytics.
Amazon has become the cloud king, with its AWS offerings providing cloud-based storage and processing that takes a lot of the cost out of deploying new products and applications. Netflix, DropBox and Yelp are all AWS clients, but the most important user might be Amazon itself.
Oracle customers have lots of questions for the database giant. If you’re one of the 50,000 people Oracle expects to converge on the Moscone Center starting Sunday–or even if you’re not–here are some key things to look out for at the big Oracle OpenWorld 2011 Conference.
Much like everyone has some product or strategy to optimize on “the cloud,” momentum is already gathering around the next big technology trend to drive buzzzwords — big data. VMware is no exception, so I spoke with Steve Herrod, the company’s CTO to find out more.
Hadoop is all the rage in analytics, but it still isn’t easy for mere mortals to utilize the big data framework. A handful of companies are trying to solve this problem, including Karmasphere with the latest version of its Analyst Big Data product.
We all know about the consumer Web innovations in the last ten years, created by crunching massive amounts of consumer data for personalization. But how are companies in other industries leveraging the big data that’s erupting from social media services?
Balancing an open-source community with commercial interests can be difficult, which is why HPCC Systems sought the help of Bruce Perens before open-sourcing its eponymous big-data-processing software. Essentially, the company either ensures the existence of a free version or pulls contributed code.
Concurrent, the company providing the Cascading data workflow API, has raised a $900,000 seed round to capitalize on the newfound excitement around Hadoop. Cascading is an open-source API for creating and running data workflows atop Hadoop clusters.
The size of Hadoop deployments appears to have tripled since October, according to statistics that Cloudera is sharing. If accurate, they help prove assumptions that Hadoop usage grows quickly once organizations wrap their heads around how it is used.
Twitter announced Tuesday it has acquired BackType, an analytics platform aimed at helping companies and brands gauge their social media impact. The possible rationale for the deal is BackType’s Storm real-time big data processing platform that could help Twitter offer well-defined analytics.
For anyone concerned about the difficulty of doing advanced analytics tasks with Hadoop, the future might be just around the corner. A stealth-mode Palo Alto, Calif.–based startup called Platfora is working to make Hadoop usable even for the non-data scientists among us.
Ravel now offers an open-source graph database that looks to bring the benefit’s of Google’s Pregel project to the masses. Graph databases don’t get the attention of other big-data technologies such as Hadoop or NoSQL, but every Twitter user is familiar with what they can do.
EMC is throwing its weight behind Hadoop. Today, at the EMC World, the storage giant announced a slew of Hadoop-centric products, including a specialized appliance for Hadoop-based big data analytics and two separate Hadoop distributions. EMC’s entry is going to shake-up the Hadoop market.
Data-integration specialist Syncsort is releasing two new Hadoop tools that it says will give Hadoop users a better, faster experience than they can achieve using Apache Hadoop alone. Unlike some other recent announcements, however, Syncsort is looking to improve Hadoop rather than replace aspects of it.
If Yahoo plans to spin off its white-hot Hadoop business, it would make Yahoo the third vendor operating alongside Cloudera and IBM — fighting for what, right now, are only speculative customer dollars. Would Yahoo’s spinout have what it takes to compete?
Hadoop is the talk of the town when it comes to big data, but it’s not without faults that have some users…
Just like every vendor now has a cloud product and every company has a cloud strategy in place, big data efforts also will become ubiquitous over the next couple years, and the two very well might merge in the near future.
Cloudera released version 3.0 of its distribution of Apache Hadoop (CDH3) Tuesday. CDH3 is a big reason why, despite a recent spate of Hadoop-based big data products either on the market or about to be there, Cloudera says it isn’t sweating all the new competition.
A handful of new releases and partnerships this week — as well as a big award — illustrate just how versatile the data-processing tool Hadoop is and how widespread its use might become. Hadoop is becoming a more viable tool for everyone from business users to journalists.
Hardware rarely comes up in discussions about big data, save for those centered on data warehouse appliances. But the omission hardly means hardware is irrelevant. In fact, big gear might become a big deal as companies look to bolster the performance of their big data systems.
One of the statements that struck me most from Structure: Big Data was CA CTO Donald Ferguson’s notion that big data represents a “very promising” opportunity for startups, particularly those targeting specific target use cases. I think he’s right, particularly with regard to the latter part.
As organizations strive to analyze more data than ever and to do it faster than ever, the results they’re getting might actually be worse than those in the pre-big-data and real-time world — at least temporarily.
When it comes to social data, one of the biggest firehoses around is the one that comes from Twitter. Trying to make sense of 140 million tweets a day in something close to real-time is a significant challenge, says Tap11 chief technology officer Braxton Woodham.
Using Hadoop to process data for targeted web advertising efforts is nothing new, but this week, two companies in the video advertising space also stepped forward to highlight how Hadoop is helping them deliver the right ads to the right viewers for their clients.
Just over than a month after discontinuing its Hadoop distribution to focus on the flagship Apache Hadoop project, Yahoo is proposing some changes to the Hadoop MapReduce component that could significantly improve processing performance. The proposal illustrates just how beneficial Yahoo’s renewed focus could be.
Facebook is working on a real-time analytics dashboard to let users determine which content is getting the most attention from visitors. As described in an educational session on Wednesday night in Facebook’s Seattle office, the service is built atop HBase and tracks about 100 metrics.
Two popular big data startups, Karmasphere and 10gen, made management changes this week, which might signal that the companies’ boards feel they’re poised to make runs at the big time and need seasoned leadership to take them to the next level.
It was a big week for big data, with two key trends adding fuel to claims that data management and analysis will never be the same. Even laggards will be tempted to give big data tools a try to see what all the hype is about.
Few would argue that Hadoop doesn’t have a bright future as a foundational element of big data stacks, but Piccolo, a new project out of New York University, is moving data in-memory in an attempt to improve parallel-processing performance beyond what Hadoop and/or MapReduce can do.
With enterprise data volumes growing, business and IT leaders face significant opportunities and challenges from big data. The space, of course, is not without its obstacles — including plenty of privacy concerns — but in 2011, there are numerous sales-growth opportunities and new business models finally surfacing.
Yahoo is ceasing development of its Yahoo Distribution of Hadoop and will be folding it back into the Apache Hadoop project. The company cites a goal “to make Apache Hadoop THE open source platform for big data” as a driving force behind its new strategy.
Today’s links offer further proof that technologies like Hadoop and NoSQL aren’t going anywhere — and might even be expanding — and that choosing the right cloud computing solution really should be about what’s best for the individual business (e.g., public vs. private, or available vs. reliable).
Hadoop startup Cloudera has rounded out its support of the Apache Software Foundation by becoming a Silver-level sponsor. Cloudera already contributes code and personnel to the Apache Hadoop project and Cloudera’s Doug Cutting (and Hadoop creator) is the ASF chairman.
On Friday, Microsoft’s HPC division opened up the company’s Dryad parallel-processing technologies as a Community Technology Preview (CTP). Dryad could be a rousing success, in part because Hadoop — which is written in Java — is not ideally suited to run atop Windows or support .NET applications.
There was much talk about cloud computing today, all of it hitting different aspects — from how IT organizations will adopt it to what makes a “niche” cloud to how AT&T’s spotty network helped drive the need for it. Hadoop and Cassandra news also caught my eye.
Web infrastructure is a hot topic today, after Amazon Web Services experienced an outage over the weekend, and after Facebook released some interesting details about its Hadoop cluster on Friday. Even LinkedIn is making headlines by expanding into a new Los Angeles data center.
Chalk another one (two, actually) up for Hadoop. Among the big news today is Apple stepping up its Hadoop development efforts, and Datameer targeting social-gaming companies for its Hadoop-powered spreadsheet application. Elsewhere, data center spending is still high, and IBM is looking to revolutionize high-end processors.
Matthew Aslett at The 451 Group posted some Google Trends graphs showing that searches for “Hadoop” far exceed searches for “big data.” I ran some of my own to dig deeper. Users, it seems, are just concerned with tools to help them ride the big data wave.
It’s not always good news with cloud computing, and we saw that today with someone calling out Enomaly’s new SpotCloud, someone else detailing the difficulties of developing a mobile app in Windows Azure, and the Cloudscaling boss calling out the traditional definition of cloud computing.
It was a big year for NoSQL and big data, but now those vendors need to buckle down on their revenue models and make a head-on charge to the enterprise. Because, let’s face it; while the web leads the innovation, the enterprise leads the economy.
Hadoop startup Cloudera has raised another $25 million, bringing its total funding to $36 million. The new funding bolsters Cloudera’s position as the hub of the commercial Hadoop world, and the belief that Hadoop will become the centerpiece of many Big Data efforts.
Hadoop World is taking place today, and, indicative of the general momentum around Hadoop, there is plenty of news coming from the event. As one should expect, Cloudera is driving the action, but it brings vendors and service providers of all stripes into the mix.
The Hadoop hoopla is generating increasing numbers of announcements from more and more vendors. From startups to large established players, new products and partnerships are emerging which confirm the emergence of a vibrant Apache Hadoop. Hall explains the three emerging layers in the “Hadoop stack.”
Commercial Hadoop startup Karmasphere today released the results of a survey of 102 Hadoop developers regarding adoption, use and future plans. The results provide some interesting insights into how Hadoop grows within organizations and underscore its status as an extremely valuable, but none-too-simple analytics tool.
As Big Data gathers steam within the consumer web, Cloudera is making it possible for mainstream IT to tap into this trend through its distribution of Hadoop, suggested by the company’s customer growth. Lower costs and improved ease-of-use are making Hadoop a reality for enterprise.
While settling on a standard big data stack is deeply important to the big data industry as a whole, I’m nonetheless questioning the operational and competitive consequences for companies who choose to buy into this standard without first considering the value of building a proprietary solution.
My company, SlideShare, has been using cloud computing for almost everything we do. But if comic-books have taught us anything at all, it’s that with great power
comes great responsibility — and we’ve made our share of blunders. Here are a couple of the more notable ones.
Hadoop, the big data analytics software is so hot right now. Heck anything big data is so hot right now. Today’s links offer insights to Hadoop alternatives, how to use Hadoop and an endorsement of Microsoft’s platform as a service strategy.
Hadoop, thanks to the growing importance of Big Data Analytics is gaining traction inside the enterprise. What’s been missing for Big Data Analytics has been a LAMP-like stack. Fortunately, a stack for Big Data aggregation, processing and analytics is on its way.
A few months ago, I posited that additional funding for Cloudera and Karmasphere signifies a large market opportunity for solutions that utilize the open-source analytics tool Hadoop. From the news generated this week by Yahoo’s third annual Hadoop Summit, my beliefs of this have only been affirmed.
Google, nearly six years since it first applied for it, has finally received a patent for its MapReduce parallel programming model. The question now is how this will affect the various products and projects that utilize MapReduce, such as Apache’s MapReduce-inspired Hadoop project.
Berkeley Labs has been working on an open source version of a system for demand response services for the power grid (called…
Can an open source data management system do for the smart grid what Google’s open mobile operating system (s GOOG) Android is…
Cloudera, a startup based in Burlingame, Calif., today announced the release of its first commercial product, Cloudera Desktop. It’s a graphical interface…
At the Hadoop Summit in Silicon Valley today, Yahoo (s yhoo) announced the availability of the Yahoo Distribution of Hadoop, a source-only…
Updated: Hadoop, the open-source software framework, is one of the technologies we have been following closely. If you are equally interested in…
Hadoop, an open-source software program that helps process incredibly large data sets, has been generating plenty of buzz. The upcoming Hadoop Summit on…
At first glance it’s hard to see how the open-source software framework Hadoop, which was developed for analyzing large data sets generated…
Cloudera, a Burlingame, Calif-based company offering services around the open source software framework Hadoop, has raised $5 million in Series A funding…
Last week, OStatic noted the rumor, first reported by VentureBeat, that Microsoft intended to buy Silicon Valley semantic search engine Powerset for…
Parascale, a Cupertino, Calif-based start-up that has developed a storage file system for a cloud of computers announced that it had attracted…
We are only ten days away from Structure’08, our web infrastructure conference. As part of our preparation for this event, our team…
As part of our renewed focus on technologies that matter, we are launching a series of events called GigaOM PM, occasional meetups…