In an effort to get a grip on how global a phenomenon bitcoin really is, I acquired data on more than 1.3 million tweets, spanning the month of February, about the crypto-currency. Here’s a breakdown of who’s tweeting, where and what they’re sharing. Read more »
Twitter has announced the winners of its inaugural data grants program, which provide select researchers with access to the entire history of tweets. The six winning projects are wide ranging, from gastrointestinal illnesses to sports. Read more »
Hadoop startup Karmasphere, which launched in 2010, has sold its intellectual property to credit-scoring specialist FICO. Karmasphere appears to have been struggling for adoption and funding, so selling its assets was not an unforeseen turn of events. Read more »
Google continued its heavy infrastructure spending in the first quarter of 2014, to the tune of more than $2.3 billion. It’s the third consecutive quarter the company has topped $2 billion in capital expenditures. Read more »
Microsoft showed off more its big data strategy on Tuesday in an event that touched on everything Excel to “ambient intelligence.” If the company can execute, it has a shot to repeat its desktop success in the data era. Read more »
Twitter is buying Gnip, the data startup that has full access to the Twitter firehose. Twitter now has what it needs to deliver meaningful data to companies that want to examine specific discussions and activities, rather than just the content Twitter chooses to publicize. Read more »
Kim Weins, vice president of marketing at RightScale, sees a lot about where its customers are deploying cloud workloads and how they intend to expand them across multiple platforms. She came on the Structure Show to talk about what’s hot, including — surprisingly — VMware. Read more »
A startup called Gridspace is trying to reinvent the meeting process by taking the note-taking out of it. Its Memo system combines hardware, software and machine learning to let a computer take notes while the people focus on ideas. Read more »
The Senate passed an amended version of the Data Accountability and Transparency (or DATA) Act on Thursday, nearly five months after the House passed its version 388-1 in November. The bill standardizes the process, platforms and formats in which federal agencies report how they spend their money. The bill had strong bipartisan support but faced opposition from the Office of Management and Budget. It could be a coup for certain technology vendors, including supporter Teradata, which stand to win more government deals as all that data becomes easier to store and analyze using commercial software.
MapR is the latest Hadoop vendor to embrace Apache Spark, adding the entire Spark stack of technologies to its distribution. It’s a smart move by MapR, but just more validation that Spark might be the data-processing framework of the future. Read more »
A startup called Emerald Logic claims it uses an evolutionary process to discover the best algorithm for predicting outcomes from any dataset. It might sound to good to be true, but the company claims successes already and is one of several startups trying something similar. Read more »
A Mankato, Minn.-based company called Farm Intelligence is helping farmers get a sense of what’s happening in their fields so they can act fast to maximize yields. It’s already managing a million acres and nearly a petabyte of data, but it expects to amass much more. Read more »
MongoDB has released version 2.6 of its eponymous NoSQL database, complete with some significant new capabilities around monitoring and management, search, indexing, performance and pipelines. The company (formerly known as 10gen) pretty clearly has the most widely used NoSQL database — especially among web developers — so now the push is to make it more palatable for large enterprises and other users who’ll actually pay for it. MongoDB looks like the one NoSQL startup poised for an IPO at some point, and a more-mature product could help shore up revenue to get investors excited.
Sporting goods company Wilson is working with a Finnish startup called SportIQ to create a basketball that uses sensors and artificial intelligence to determine how far the ball traveled and whether the shot was made. It’s not the first application of sensors and algorithms into sports gear — we already have them in football helmets, soccer balls and basketball nets, for example — but the Wilson basketball is pretty unique in that it seems to target individual consumers, meaning anyone with the ball, a hoop and a web connection can start quantifying their game.
Teradata announced a new set of features and products on Monday that should improve its position as a go-to analytics vendor even in an age of Hadoop. But as open sources technologies evolve, Teradata might face a challenge to attract new users. Read more »
Cloudera CEO Tom Reilly came on the Structure Show this week to talk about why the company entered into a deep partnership with Intel, just how much cash it raised and when it might go public. Read more »
Citus Data, a startup focused on turning PostgreSQL into a scale-out analytic engine, has developed a developed a columnar data store for the popular open source database. The company is open sourcing its extension for single-node environments, although it’s offering a distributed version as part of its CitusDB software. Citus already supported interactive SQL queries over Postgres (on which its technology is based), Hadoop and MongoDB, but columnar stores are faster for certain types of queries. Also, the compression features of the ORC file format that CitusDB uses can cut disk space by more than half.
Facebook has released part of its code that helps its data scientists and other staff easily build, manage and verify A/B tests, which are an important task for any website or application. Read more »
Twitter has released some details on its Manhattan database system, which was built to power a wide variety of applications that existing technologies can no longer handle. Twitter handles thousands of tweets per second, which means speed and scale are critical. Read more »
A new startup called ElasticBox aims to simplify application development by making it a more modular experience. Among the company’s early customers is Netflix, which is trying to make its internal IT department function more like a cloud. Read more »
EMC-and-VMware spinoff Pivotal has reworked the pricing of its big data software in order to get more customers buying into its vision of a true data platform. It’s essentially giving away its Hadoop distribution and charging one price for access to all of its database software. Read more »
Apache Tajo, a relational database warehouse system for Hadoop, has graduated to to-level status within the Apache Software Foundation. It might be easy to overlook Tajo because its creators, committers and users are largely based in Korea — and because there’s a whole lot of similar technologies, including one developed at Facebook — but the project could be a dark horse in the race for mass adoption. Among Tajo’s lead contributors are an engineer from LinkedIn and members of the Hortonworks technical team, which suggests those companies see some value in it even among the myriad other options.
GPU maker Nvidia is hoping to ride the wave of artificial intelligence. The company is already powering machine learning workloads within data centers of large companies, but now it’s targeting individuals with a cheap-but-powerful development kit targeted at robotics and the internet of things. Read more »
Hadoop pioneer Cloudera has said it closed on a $900 million round of financing that gives Intel an 18 percent stake in the company. Rumors had Intel’s investment at around $100 million, but it’s likely much more. Read more »
ClearStory Data has raised a $21 million series B round for its business intelligence service focused on analyzing multiple datasets in real time. The company’s approach is one of several new approaches to analytics that has investors excited. Read more »
Google has kicked off a cloud price-cutting war with Amazon Web Services, which is a turn of events that doesn’t seem to bode well for small cloud providers that will have a hard time keeping up on price, features and scale. Read more »
The Apache Mahout project will now support Apache Spark and another data engine called H20 as it tries to retain its status as the go-to set of machine learning libraries for Hadoop. Read more »
Cloudera and Intel have entered into an agreement that makes Intel Cloudera’s largest strategic investor and makes Cloudera Intel’s preferred partner for Hadoop distributions. It will forego its own distribution and start selling and engineering for Cloudera’s software. Read more »
The name pretty much says it all: WebScaleSQL. That’s the new open source project Facebook announced on Thursday, which includes early contributions from other web giants that have pushed MySQL to its limits. Read more »
A Seattle-based startup called Indix has raised an $8.5 million series A-1 round of venture capital for its service that keeps a real-time index of the products available across the world of online retail. Avalon Ventures and Nexus Venture Partners led the round, which brings Indix’s total funding to $14.4 million. The company’s service targets retail-industry users that want to keep track of trends and the competitive landscape, including who’s selling what and for how much.
AlchemyAPI has released a new deep-learning-based API it says can automatically categorize content into inventory suitable for targeted advertising. It’s among a handful of improvements to AlchemyAPI’s service and in the deep learning space, in general. Read more »
New research highlights a computer vision system that’s much better at telling when people are faking expressions of pain than are other humans. It’s the latest in a series of computer vision advances that foretell a brave, new and possibly creepy world. Read more »
Facebook detailed on Tuesday a new cybersecurity framework called ThreatData. It’s a collection of systems for ingesting, analyzing and acting upon threat data that can vary greatly in both type and frequency. Read more »
A startup called Boostable is helping individual sellers in large marketplaces like Etsy and Airbnb target potential customers on sites such as Facebook. It’s yet another company trying to package up some advanced algorithms for users with little or no IT budget. Read more »
Numenta, the machine learning company from Palm creator Jeff Hawkins, has narrowed its business model to focus solely on predicting anomalies in Amazon Web Services instances. It’s one of several big changes at the company in the past few years. Read more »
Hadoop vendor Hortonworks has closed a fourth round of venture capital worth $100 million. It follows up on news last week that competitor Cloudera had closed its own $160 million round. Everyone agrees there’s huge opportunity in Hadoop, but capitalizing on it takes capital. Read more »
Dell has bought a statistical analysis vendor called StatSoft that makes software akin to that of SAS and IBM’s SPSS business. It seems like an uninspired buy for a company trying to forge a new direction since going private. Read more »
Big data startup Continuuity has open sourced a tool called Loom that’s designed to make deploying and managing large clusters a push-button experience. These types of tools are important as data-driven applications become more common, but the infrastructure remains a challenge. Read more »
Saffron Technology, a provider of an analytics software that it calls a “natural intelligence platform,” has raised a $7 million series B round of venture capital. The company is focused around providing an advanced class of business intelligence by building a technology that can ingest data from myriad enterprise data sources and add intelligence and memory on top of them. Saffron claims analysts querying the system will get fast results because of the how it remembers attributes of entities in the system, and they’ll be intelligent because of its ability to make connections and learn.
Backblaze has released its latest open source storage system design, which jams 180 terabytes into a single array at just 5 cents per gigabyte assembled. Read more »