Facebook has open sourced a new embedded database called RocksDB that’s meant to take advantage of all the performance flash has to offer, from right on the application server. It might be a sign of best practices to come. Read more »
Dropbox acquired computer vision startup Anchovi Labs and its Ph.D. founders in September 2012 to very little fanfare. But the skillset they bring could be integral as Dropbox seeks to grow into a platform and competitors like Google and Yahoo beef up their image-recognition capabilities. Read more »
This is a good blog post from Gartner analyst Alessandro Perilli about some of the problems facing vendors selling OpenStack as private-cloud software. You should read it. My two cents: If OpenStack vendors really are at a loss for how to describe their products, perhaps they should look at how the Hadoop market has been able to (seemingly) thrive thanks to a strong community and clear product visions among the vendors involved, beyond the open source code.
You didn’t think all the research Microsoft has done around deep learning was just for show, did you? The company’s deep learning models are now powering voice commands on the Xbox One platform, thanks to a direct connection to Bing. Read more »
The days of the cold call might be gone for salespeople. Actually, the days of the not-too-promising call might soon be gone, too. On Tuesday, a company called InsideSales introduced a new capability that infuses neural network technology (the basis of deep learning) into its products to help identify the best leads and even the best ways to approach them. However, scoring sales leads is becoming the new black. We recently covered a company called Infer that delivers a similar service, and companies such as Intel are even doing some of this internally.
HP released a new version of its Vertica database than easily connects with other systems to bring in unstructured data. It’s a big update for a database based on analytic SQL workloads but that needs to find a way to play with today’s data formats. Read more »
Intel is using big data to improve everything from manufacturing efficiency to sales, and is increasingly looking toward technologies such as Hadoop and machine learning to create new opportunities. Read more »
Anyone wondering how Amazon Web Services is able to roll out so many new features to its cloud platform each year might just want to read the new biography on Amazon CEO Jeff Bezos, whose management style touches everything within the company. Read more »
Cycle Computing CEO Jason Stowe dives deep into the economic and innovative benefits of running massive scientific workloads in the cloud. When researchers aren’t constrained by the systems the can afford, they can ask bigger questions and get better results. Read more »
This post from the New York Times‘ Open blog talks about the architecture and algorithms underpinning its content-personalization engine. Its experience speaks to some larger trends around companies moving from batch to stream processing and to cloud services overall. The Times’ recommendation engine used to rely on MapReduce jobs that ran every 15 minutes, but now relies on a homegrown real-time system. It used to run on Cassandra, but now runs on Amazon’s DynamoDB service.
Finnish researchers have devised an algorithm that accurately determines mobile phone users’ modes of transportation by analyzing data from their phones’ accelerometers. Useful? Absolutely! Annoying? Possibly … Read more »
Amazon Web Services VP and Distinguished Engineer James Hamilton explained during a session at the AWS re:Invent conference how the cloud provider keeps costs as low as possible and innovation as high as possible. It’s all about being the master of your infrastructure. Read more »
It was a good day for anyone invested in the greater NoSQL market, as Riak creator Basho and Couchcase both announced big customers wins. Basho highlighted The Weather Company, which is running and replicating Riak across multiple global data centers, while travel-industry technology provider Amadeus is working with Couchbase to deploy that database across its customer-facing applications. It’s good news for the NoSQL space because any large companies choosing databases other than MongoDB is validation that they matter and a sign they’ll be around for a while.
Amazon Kinesis is a new service for capturing and processing streaming data, and it’s also about the only thing of its ilk available as a cloud service. Will other cloud providers ever catch up with AWS? Read more »
IBM has upped the ante in the API game by making its Watson question-answering system available as a service. That’s right, Watson could soon power your smartphone app. Read more »
Amazon Web Services announced a new service called Amazon WorkSpaces during its re:invent conference on Wednesday. If it can deliver VDI and gain traction where others have not, it could be a big boon for the company. Read more »
Machine learning startup Ayasdi has teamed up with Lawrence Livermore National Laboratory, as well as the Texas Medical Center, to help advance data analysis in a variety of complex fields. Read more »
Amazon Web Services is now offering up free access to three NASA datasets from the NASA Earth Exchange project about the world’s weather, geology and vegetation. The cloud is a natural place to house large datasets that many people or institutions might want to analyze without requiring everyone to download, store and analyze the data locally. Scientific data has proven particularly appealing early, with numerous cloud providers already hosting various datasets, often in the fields of genomics and biology.
IBM’s Steve Mills has been with the company for decades, and during that time has seen lots of technologies and trends come and go. Here are his thoughts on how the company approaches selling software in a changing IT world. Read more »
This survey from State Street and the Economist Intelligence Unit is a pretty good look at the opportunities and challenges of using data in the financial services industry. Many respondents noted the challenge of integrating lots of data sources, which is understandable and probably only going to get harder. It seems there’s a lot of promise in new services/data sources such as Dataminr and Premise Data, but they also represent a pretty big divergence from tradition.
Backblaze has shared the designs of its 180-terabyte storage pods, and now it’s sharing some details about how long the drives inside those boxes last. According to the company, nearly three-fourths of all the drives it has deployed are still running. Read more »
Rackspace revenue continued to rise during the third quarter, but growth was slow and profits were down year over year. The company chalks up the latter to increased forward-looking investments, but the elephant in the room is Amazon. Read more »
The Facebook-led Open Compute Project is set to vote on four new specifications that would make open source networking switches and OS software a reality in the near future. Read more »
A Dallas-based startup called Servergy, which makes low-power servers about half the size of traditional servers, has raised a $20 million series C round of venture capital. The company’s servers run on 8-core 1.5 GHz Freescale Power Architecture processors and, although 1U high, are only 14 inches deep and 8.25 inches wide. Servergy appears to have raised just under $30 million so far, according to SEC filings, although its has not named its investors.
Correction: This post was corrected at 3:15 p.m. to correct the manufacturer of Servery’s processors, which is Freescale and not IBM.
If there was a NoSQL storm brewing earlier this decade, Hummer Winblad’s Mitchell Kertzman thinks it has all but died down. People thought NoSQL would blow up the SQL world, he said on this week’s Structure Show, but it might just be a nice complement. Read more »
Amazon Web Services’ second-annual user conference is around the corner, but its scale is as much about AWS’s platform as it is about the ecosystem of developers and applications it has enabled. Read more »
Splunk is switching CTOs, as co-founder Erik Swan is stepping down to be replaced by former Yahoo exec and Continuuity co-founder Todd Papaioannou. Read more »
Facebook has open sourced Presto, a SQL engine it says is on average 10 times faster than Hive for running queries across large data sets stored in Hadoop and elsewhere. Read more »
For workspace designer Jennifer Magnolfi, tackling a crumbling downtown Las Vegas and turning it into a place that inspires interaction and creativity was a whole new experience. What she saw, though, was that smart design can have amazing effects even in unlikely places. Read more »
A Seattle-based startup called Seeq has raised $6 million to help companies capitalize on the Industrial Internet by letting use the streams of data their business processes are generating. Read more »
Dataminr, a startup dedicated to analyzing the Twitter firehose of real-time tweets, is using today’s BlackBerry news as proof of its value. The company claims it gave users a 3-minute advantage in which time to start selling BalckBerry shares. Read more »
HGST is now selling helium-filled hard drives that can hold more capacity (and more disks) while using less energy than traditional hard drives. One early use of the tech is CERN, which is impressed even if helium won’t solve all its capacity problems. Read more »
In Part 2 of my look at the issue of web privacy, I address the likely reality that no one inside Google, Facebook or the NSA cares about any of us on an individual level. Read more »
Venture capitalist Chris Lynch has disrupted the database industry before as CEO of Vertica Systems, but now he’s watching Hadoop take it to the next level. Here are his thoughts on the challenges legacy vendors face and who’s positioned to ride the big data wave. Read more »
This is the first of two posts in which I try to come to terms with the privacy concerns inherently tied to the digital era. Should I feel powerless, indifferent or take a laissez faire attitude and just go along for the ride? Read more »
Hadoop startup Datameer is selling a $49 “charity edition” of its spreadsheet-based Hadoop analytics software, with all proceeds this month going to help elephants injured by poaching. Read more »
Deep learning is one of the hottest trends in big data right now and is currently underpinning the cutting edge in areas such as natural language processing and image recognition. Here’s a brief guide about what it is about who’s doing it. Read more »
Teradata’s CEO addressed the impact of Hadoop on its earnings call and, according to this report from ZDNet, downplayed its effect. In fact, he said only 4 to 8 percent of Teradata workloads might ever move to Hadoop. Even if that’s true for workloads, what about the data itself? It might not need to live in those pricey appliances.
Dropbox has hired Kevin Park as its new head of technical operations and IT. Park was at Facebook from 2006 until 2011, where he was a director of technical operations. This isn’t the first time Dropbox has brought on former Facebook employees to help grow its engineering team — in 2012 it bought a startup called Cove that was started by Aditya Agarwal (now VP of engineering) and Ruchi Sanghvi (formerly VP of operations), who built Search and Newsfeed, respectively, during their time at Facebook.
Correction: This post has been updated to clarify that Ruchi Sanghvi is no longer with Dropbox.
This is a pretty interesting benchmark study, although the headline is a bit misleading because Hadoop isn’t really optimized for graph analysis. When you look at comparisons to Spark, GraphLab and other platforms, it seems the decision of what to choose might come down to data volume, acceptable latency and cost, especially when considered against the value of that graph workload. Projects like Giraph and other YARN-enabled engines might make Hadoop look better, too.