Google has released another paper showing off the power of its deep learning techniques for text analysis. It shows how models can detect similar usage of words across different languages, meaning it can accurately translate words and concepts from one language to another. Read more »
This article from Klint Finley at Wired Enterprise raises some good questions about the ideal integration of big data into nonprofits. I rather prefer the efforts of DataKind and the SumAll Foundation, which try to help nonprofits solve problems rather than harvest email addresses. The flipside, of course, is that individual donors are what keep the lights on in many cases, so access to more of them is good.
Gnip, one of a handful of companies with direct access to the Twitter firehose, is now letting its customers query 30 days worth of tweets via a new search API. CEO Chris Moody describes it as fast delivery on small data. Read more »
A team of professors behind the open source Spark and Shark in-memory big data projects has raised $13.9 million to commercialize the products via a company called Databricks. Spark and Shark are designed to be much faster and more flexible than Hadoop MapReduce and Hive. Read more »
Microsoft is working some impressive new features into Power BI, its Excel add-on for Office 365 that’s focused on making analytics easier. Among the capabilities announced on Wednesday were natural-language search and visualizations, and new and improved maps. Read more »
This seems like good advice from Hortonworks’ Ofer Mendelevitch. Python? Check. Java? Check. Hadoop? Check. SQL? Check. Stats? Check. But his closing remark — “The road to data science is not a walk in the park. … This takes time, effort and a personal investment.” — might be the most important. We often talk about democratizing some of the data science tools, but the really good ones can do it all.
Hadoop startup MapR has released a new version of its commercial HBase database, called M7. According to a press release, “HBase applications can now benefit from MapR’s high performance platform to address one of the major issues for on-line applications, consistent read latencies in the less than 20 millisecond range across varying workloads.” MapR released M7 in May and claims its architectural improvements over open source HBase result in a faster, easier experience.
Publishing analytics startup Parse.ly moved its production application off of Rackspace in 2011 to save costs. Two years later, it has watched Amazon Web Services costs drop precipitously, and now CTO Andrew Montalenti says it’s probably time to head back to the cloud. Read more »
Modular data center manufacturer IO has filed a confidential S-1 form and plans to go public in the near future. The company has made a name for itself selling fully contained data centers that take up only 462 square feet of floor space. Read more »
Is there a line beyond which people are no longer mere Quantified Selfers but something much more annoying? Could data really be used as “success theater” to make someone seem more successful than he really is? Of course. You know who you are …
NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily that stores and indexes wide varieties of customer data using HBase and other open source technologies, and then layers various various analytic functions and applications on top of it. The data layer of Lily is available as an open source download.
When U.S. lawmakers and policy experts get tired of fighting ideological battles over the past, they might want to put a little effort into helping improve the country’s future. Here are four technology issues that could help improve the economy and outline Americans’ digital rights. Read more »
This is an interesting patent application, in part because of its techniques and in part because — like many technology-related patent applications — it’s hard to see how it’s particularly novel. The idea of using someone’s social graph to find influential connections that could inform mobile-app recommendations is pretty good, but at the core aren’t we just talking about the decision to value one variable more than another in a recommendation system?
Hadapt, a startup that has been pushing SQL on Hadoop since 2011, is rolling out a new technology it calls “schema-less SQL.” Essentially, the SQL portion of Hadapt’s platform will automatically form columns from the keys of JSON and other data types, thus making the associated values queryable like values in a standard relational database. This sort of joint SQL-NoSQL support is likely to become a lot more normal for analytic databases. Curt Monash has a good technical breakdown of the new Hadapt feature.
The Comparing Constitutions Project has launched new web tool called Constitute, which lets users search their way through the world’s constitutions by keyword or theme. Not only is the tool handy for gathering info on international laws, but it’s also indicative of how the web can ease access to valuable data via nice interfaces masking lots of complicated data-prep work. The organization’s website has lots of other constitutional data and visualizations, too.
At Structure: Europe 2013, New Relic Founder Lew Cirne, Kleiner Perkins General Partner Michael Abbott (former Twitter engineering VP) and North Bridge General Partner Jonathan Heiliger (former Facebook engineering VP) spoke about the business opportunities around next-gen analytics. Read more »
Structure:Europe was about many things — cloud computing, privacy, how to build a global business — but it might have been most about scale. The goal of any tech company is to handle untold millions of users and their data, and many speakers are doing just that. Read more »
A Denver-based startup called AlchemyAPI is close to rolling out deep-learning-based image recognition via its API service. The company has made something of a name for itself in the text-analysis world, and it says it can do image recognition as well as Google. Read more »
Rackspace VP of Technology Nigel Beighton shared his thoughts on the most important tools in the cloud at Structure: Europe. If you want to get the most out of the cloud, virtual servers alone won’t cut it. Read more »
Is it better for hosting providers to band together to take on Amazon Web Services or to focus on what each service provider does best? Read more »
Analytics database startup MemSQL has integrated JSON support into its big, fast in-memory SQL database. Bridging both worlds is a compelling idea, although execution isn’t always easy. Read more »
Recommind, a San Francisco-based company that sells machine learning software optimized for e-discovery in the legal industry, has raised $15 million from SAP Ventures. The new money will go toward growing the company’s footprint outside the legal space via enterprise software that lets humans and machines work closely with one another around data analysis — something Recommind CTO Jan Puzicha discussed with me in March at Structure: Data.
There are many ways to win cloud customers away from Amazon Web Services, a panel of European cloud providers said at Structure:Europe, and none of them involve trying to be Amazon Web Services. Read more »
Samza is LinkedIn’s take on Twitter’s Storm engine for stream processing, only built on top of LinkedIn’s own Kafka messaging system. It’s the latest in a growing line of open source efforts from LinkedIn, and another notch in the belt for Hadoop. Read more »
DataSift, one of the two companies (along with Gnip) granted real-time access to the Twitter firehose, now offers real-time and historical analysis of Tumblr data. While it’s best-known for Twitter, DataSift actually analyzes dozens of social media and commenting platforms, which is pretty handy if you want to compare sentiment, engagement or whatever else across platforms where people behave quite differently.
If you’ve ever wanted to see who follows you on Twitter, where they live and what they do, but don’t have a clue how to utilize the Twitter API, it’s your lucky day. Read more »
Veteran entrepreneur, investor and founding Vertica Systems CEO Andy Palmer has some thoughts about the most-important trends and promising startups in the data space. Here’s what he had to say during our Structure Show podcast this week. Read more »
The glut of research in teaching computers to analyze and understand images could prove very helpful in letting us take full advantage of the countless hours of video we’ll produce as wearable cameras go mainstream. Read more »
An MIT professor has conducted some handy research that could help make applications run faster and use less energy by overcoming an inherent drawback of multicore processors. The problem is that although the local caches on chips save them the latency of having to access RAM, the hardware-wired algorithms powering them often assign data to cache locations randomly without considering the core trying to access it. The new software-based technique, called Jigsaw, tracks which cores are accessing what data — and how much — and assigns data locale accordingly. The paper detailing Jigsaw is available here.
New research out of Carnegie Mellon University shows that analyzing fans’ tweets can help gamblers make better bets on NFL games. Sometimes. Their technique wasn’t very effective at picking winners or betting the over/under, but it was 55 percent accurate on bets against the spread (and then only during the middle of the season). I doubt anyone will undertake this effort themselves for such a slight edge, but there might be a business here if someone can figure out a consistently accurate model.
Couchbase is officially opening up two new technologies to mobile developers as part of a public beta program. Couchbase Lite is a lightweight database designed specifically for iOS and Android devices, while Cloud Sync Gateway syncs local data with a bigger database in the cloud. Read more »
Aquamatix, a Structure: Europe LaunchPad company based in London, is trying to improve the world’s water networks with lots and lots of sensors. Fixing outdated infrastructure is expensive, but real-time data from deep inside can help target specific problems. Read more »
Yup. Makes me wonder if the tech companies that have been lobbying for Patriot Act reform over the past few years were doing so in part to get out from under the NSA’s thumb. Policy discussions were always couched in geopolitical language, but they must have foreseen the backlash even from U.S. customers if word ever got out about what was up.
Dallas-based enterprise-search company PureDiscovery has closed a $10 million series C funding round that should help it brings its BrainSpace platform to the masses. The idea is one to build knowledge about the content of documents rather than just an index of what’s where. Read more »
A San Mateo, Calif.-based startup called Space-Time Insight has raised a $20 million series C investment round led by London-based firm Zouk Capital. Space-Time provides a platform for analyzing and visualizing streaming data, and is gaining traction in the utility sector. We profiled the company in 2011, specifically its work with California ISO to put real-time energy data on an 80-foot screen in the agency’s control room. Space-Time closed a $14 million series B investment round last September.
Narrative Science, a startup that turns complex text documents into reports or articles that are supposed to resemble something written by a human being, has raised an $11.5 million series C funding round. News organizations have already used the company’s software to turn sports stats or corporate earnings statements into articles, but it has potential anywhere someone is trying to analyze loads of text documents. CIA-backed venture capital firm In-Q-Tel invested in Narrative Science in June.
Box Founder and CEO Aaron Levie has a lot to say about just about everything in the world of IT. Here’s a sampling of his thoughts about cloud computing, mobile software, the Microsoft-Nokia deal and finding time for Twitter. Read more »
It’s not just U.S. companies such as Pinterest, Netflix and every SaaS startup under the sun that are running on cloud infrastructure. There are a lot of major European companies and organizations using cloud computing, too. Many of them will be at Structure: Europe. Read more »
Hortonworks is making progress on its mission (via a project called Stinger) to speed up SQL-like queries in Hadoop using Apache Hive. New features in the latest version of Hortonworks’ Hadoop distribution have improved Hive performance tens of times in some instances, and the company is aiming for 100x improvements soon. Hortonworks has also added support for new types of SQL data. Competitor Cloudera opted to forgo Hive in favor of its own Impala technology for interactive queries.
eBay has acquired Seattle-based price-prediction startup Decide.com, and the service will shut down on Sept. 30. The entire team will head over to eBay to help the e-commerce giant improve its experience through predictive modeling. The entire team except Co-founder and CTO Oren Etzioni, that is: the University of Washington computer science professor, Madrona Venture Group partner and former Farecast founder is heading up Paul Allen’s new Allen Institute for Artificial Intelligence.