This article from Klint Finley at Wired Enterprise raises some good questions about the ideal integration of big data into nonprofits. I rather prefer the efforts of DataKind and the SumAll Foundation, which try to help nonprofits solve problems rather than harvest email addresses. The flipside, of course, is that individual donors are what keep the lights on in many cases, so access to more of them is good.
Gnip, one of a handful of companies with direct access to the Twitter firehose, is now letting its customers query 30 days worth of tweets via a new search API. CEO Chris Moody describes it as fast delivery on small data. Read more »
German researchers have come up with a cheap way to spray gas sensors, based on a relatively new material, onto thin film. If health fears are cleared, this could result in smart food packaging — perhaps with sprayed-on radio antennas, too. Read more »
A team of professors behind the open source Spark and Shark in-memory big data projects has raised $13.9 million to commercialize the products via a company called Databricks. Spark and Shark are designed to be much faster and more flexible than Hadoop MapReduce and Hive. Read more »
Microsoft is working some impressive new features into Power BI, its Excel add-on for Office 365 that’s focused on making analytics easier. Among the capabilities announced on Wednesday were natural-language search and visualizations, and new and improved maps. Read more »
This seems like good advice from Hortonworks’ Ofer Mendelevitch. Python? Check. Java? Check. Hadoop? Check. SQL? Check. Stats? Check. But his closing remark — “The road to data science is not a walk in the park. … This takes time, effort and a personal investment.” — might be the most important. We often talk about democratizing some of the data science tools, but the really good ones can do it all.
Hadoop startup MapR has released a new version of its commercial HBase database, called M7. According to a press release, “HBase applications can now benefit from MapR’s high performance platform to address one of the major issues for on-line applications, consistent read latencies in the less than 20 millisecond range across varying workloads.” MapR released M7 in May and claims its architectural improvements over open source HBase result in a faster, easier experience.
Oracle leads the league in relational databases but it’s far from clear that the company can replicate its success in non-relational and in-memory categories. That’s why Larry Ellison’s no-show matters. Read more »
Is there a line beyond which people are no longer mere Quantified Selfers but something much more annoying? Could data really be used as “success theater” to make someone seem more successful than he really is? Of course. You know who you are …
NGDATA has raised a $3.3. million venture capital round led by Capricorn Venture Partners. The company sells a software product called Lily that stores and indexes wide varieties of customer data using HBase and other open source technologies, and then layers various various analytic functions and applications on top of it. The data layer of Lily is available as an open source download.
When U.S. lawmakers and policy experts get tired of fighting ideological battles over the past, they might want to put a little effort into helping improve the country’s future. Here are four technology issues that could help improve the economy and outline Americans’ digital rights. Read more »
This is an interesting patent application, in part because of its techniques and in part because — like many technology-related patent applications — it’s hard to see how it’s particularly novel. The idea of using someone’s social graph to find influential connections that could inform mobile-app recommendations is pretty good, but at the core aren’t we just talking about the decision to value one variable more than another in a recommendation system?
Hadapt, a startup that has been pushing SQL on Hadoop since 2011, is rolling out a new technology it calls “schema-less SQL.” Essentially, the SQL portion of Hadapt’s platform will automatically form columns from the keys of JSON and other data types, thus making the associated values queryable like values in a standard relational database. This sort of joint SQL-NoSQL support is likely to become a lot more normal for analytic databases. Curt Monash has a good technical breakdown of the new Hadapt feature.
The Comparing Constitutions Project has launched new web tool called Constitute, which lets users search their way through the world’s constitutions by keyword or theme. Not only is the tool handy for gathering info on international laws, but it’s also indicative of how the web can ease access to valuable data via nice interfaces masking lots of complicated data-prep work. The organization’s website has lots of other constitutional data and visualizations, too.
RainDance Technologies, a big data genomics company, has raised $35 million to expand into new markets. Read more »
Search is evolving to fit the needs of users who don’t just want a web site, but the actual answer to the question driving the search. To stay on top semantic search technologies are key. Read more »
The European Commission’s Viviane Reding proposes single set of data privacy rules for the whole region and we recap Structure:Europe. Read more »
At Structure: Europe 2013, New Relic Founder Lew Cirne, Kleiner Perkins General Partner Michael Abbott (former Twitter engineering VP) and North Bridge General Partner Jonathan Heiliger (former Facebook engineering VP) spoke about the business opportunities around next-gen analytics. Read more »
Users have grown accustomed to a real-time web, but now they want an easier-to-implement real-time integration between web services. REST Hooks seems to be the emerging standard for such integration. Read more »
How will health data tracking reach the masses? By showing people that it can seamlessly improve healthcare delivery, day-to-day communication and even entertainment. Read more »
Structure:Europe was about many things — cloud computing, privacy, how to build a global business — but it might have been most about scale. The goal of any tech company is to handle untold millions of users and their data, and many speakers are doing just that. Read more »
Yummly’s first mobile app doesn’t just port its semantic food search engine over to iOS. Rather, the company has designed its iPhone app to be used in the grocery store rather than the kitchen. Read more »
Randall Munroe, the man who writes web comic xkcd, also runs a series called What If in which he offers the answer to questions using data gleaned from the web and physics. On Tuesday the he tackled the question “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” The result is a speculative blog post that estimates Google’s server count (between 1.8 million and 2.4 million) total storage (10 exabytes) and tells you how to find the search giant’s secret data center locales (go read it to find out.)
A Denver-based startup called AlchemyAPI is close to rolling out deep-learning-based image recognition via its API service. The company has made something of a name for itself in the text-analysis world, and it says it can do image recognition as well as Google. Read more »
Plaid, a startup seeking to give developers access to financial data has raised $2.8 million. With services like Plaid we could see the emergence of a new breed of Quicken-like products or new connected devices. Read more »
Two top VCs talk about the big data analytics opportunity and Lew Cirne talks about doing more with the data New Relic already collects. Read more »
Analytics database startup MemSQL has integrated JSON support into its big, fast in-memory SQL database. Bridging both worlds is a compelling idea, although execution isn’t always easy. Read more »
Recommind, a San Francisco-based company that sells machine learning software optimized for e-discovery in the legal industry, has raised $15 million from SAP Ventures. The new money will go toward growing the company’s footprint outside the legal space via enterprise software that lets humans and machines work closely with one another around data analysis — something Recommind CTO Jan Puzicha discussed with me in March at Structure: Data.
Bright.com, a San Francisco-based, data-driven job search site, has raised $14 million in a Series B round. Read more »
Upgrading your network in a high-pressure environment isn’t easy. But finding the fastest way to deploy ethernet fabrics just got easier. Brocade® VCS® fabrics can be deployed five times faster than any other network on the market. See how. Read more »
The NSA spying scandal and general awareness of security vulnerabilities has led many businesses to start thinking about how they protect their data. Here’s how to acquire cloud services while keeping privacy in mind. Read more »
A contract, obtained by freedom-of-information request outfit MuckRock, shows the NSA buying tools that can help it attack flawed software. Previous reports have suggested the U.S. is the biggest buyer of these “zero-day” flaws. Read more »
Samza is LinkedIn’s take on Twitter’s Storm engine for stream processing, only built on top of LinkedIn’s own Kafka messaging system. It’s the latest in a growing line of open source efforts from LinkedIn, and another notch in the belt for Hadoop. Read more »
In a 70-page white paper released Monday, Facebook, Qualcomm and Ericsson tried to connect the app and cloud world with carriers as part of the internet.org effort. Even if this doesn’t bring broadband to all, it’s a necessary conversation. Read more »
Re:char’s new venture Soil IQ has teamed up with Yves Behar to make a connected soil fertilizer gadget to stream data about dirt and weather conditions. Read more »
DataSift, one of the two companies (along with Gnip) granted real-time access to the Twitter firehose, now offers real-time and historical analysis of Tumblr data. While it’s best-known for Twitter, DataSift actually analyzes dozens of social media and commenting platforms, which is pretty handy if you want to compare sentiment, engagement or whatever else across platforms where people behave quite differently.
If you’ve ever wanted to see who follows you on Twitter, where they live and what they do, but don’t have a clue how to utilize the Twitter API, it’s your lucky day. Read more »
Belgium’s federal prosecutor is looking into a claim by Belgacom that its systems were hacked into and infected with a virus. Reports say the complexity of the malware suggests an intelligence agency was to blame. Read more »
Telenav hasn’t just hired OpenStreetMap founder Steve Coast away from Microsoft; the navigation company plans to wean itself entirely off of proprietary cartography, relying solely on OSM’s collaborative, crowdsourced and freely available maps. Read more »
The NSA may have found a way to monitor some credit card transactions, according to a Snowden-derived report from Germany’s Der Spiegel. The agency said in leaked documents that it found a way to access Visa transactions in Europe, the Middle East and Africa, but the financial services company denies the tapping of its networks. The report highlights an NSA financial database called Tracfin, into which SWIFT international transfer information also flows through the interception of “SWIFT printer traffic from numerous banks.”