File-sharing service Hotfile was found guilty of copyright infringement in a U.S. federal court case decided on Wednesday. But just because Hotfile appears guilty, that doesn’t mean cyberlockers are inherently evil — regardless what the MPAA says. Read more »
Hortonworks has released a set of icons for illustrating the roles of various Hadoop-ecosystem components in flow charts and other architectural diagrams. Earth-shattering? No. Helpful if you’re stuck trying to build a PowerPoint slide about your big data environment? Probably. Read more »
LinkedIn’s new University Pages are a case study in how to build a big data application. Ideas are great and pretty web design are great, but you also need people who can find and format the data, the the systems in place to make everything work. Read more »
Couchbase, a startup selling a NoSQL database of the same name, has raised a $25 million series D round. Adams Street Partners led the round and was joined by existing investors Accel Partners, Mayfield Fund, North Bridge Venture Partners and Ignition Partners. Couchbase doesn’t have the huge user base of MongoDB or the edginess of HBase, but it does have some big-name users (including Orbitz) and the company claims sales jumped 400 percent in the last year.
How much does the U.S. government request data from U.S. web properties? A lot. Here are eights charts showing data from Facebook, Google, Microsoft and Twitter about how many requests they get from across the globe. Read more »
MongoDB creator 10gen has changed its name to MongoDB, Inc. It’s probably not a bad idea to align the company’s name with the its sole product, but it will take a little getting used to. Read more »
Violin Memory has filed for a $173 million initial public offering, although it did so without much of the hype traditionally associated with Violin news. The company is on pace for $100 million in revenue this year, but it’s now part of a crowded flash market. Read more »
Hadoop-based analytics startup Tresata last week open sourced a set of machine learning libraries built on Scalding and designed to run in Hadoop and make use of the Apache Mahout project. Tresata is calling the project Ganita, and has also written a couple of explanatory blog posts about it, including how to do k-means clustering. The barriers to doing good work on big data just keep getting lower.
Publishing analytics startup Parse.ly has raised $5 million and has released its first report showing the top sources of traffic across its customer base. It claims hundreds of them, including big-name ones like Atlantic Media, Reuters and Mashable. Read more »
Based on the data scientists I’ve met and the “how to become a data scientist” talks I’ve seen, it’s hard to disagree. But SQL and coding skills can be really helpful if you need need to get stuff done beyond pure statistical analysis.
Amazon Web Services experienced a brief outage on Sunday afternoon. It only last about 60 minutes, but appears to have taken down popular sites such as Instagram, Flipboard and Vine for short periods. Read more »
Google cloud platform manager Greg DeMichillie was on our Structure Show podcast this week to defend Google’s position in the cloud computing market. He makes some fair points, but will they be enough to lure in developers and companies en masse? Read more »
This is a good presentation about Facebook’s graph-processing engine, Giraph, from a big data event held at the company’s Menlo Park campus in early June. The PRISM story kind of took over the news cycle that week, but the event also produced some news (for big data geeks, at least): Facebook’s Presto engine for interactive queries of its 250-petabyte Hadoop data warehouse.
Researchers have a devised a method for identifying fake Twitter accounts that proved highly accurate across 27 popular black-market merchants. With Twitter’s cooperation, they spotted and deleted millions of accounts, using only data generated during the account-registration process. Read more »
The last day’s NSA headlines have been about how it broke the law and even violated the Constitution. But that’s just a small part of an opinion that raises more questions than answers, and that underscores the complex nature of data privacy. Read more »
It’s natural to hear all the hype about big data and sense a bubble is forming, but the speakers at this year’s Structure: Europe conference have proven it’s for real — and they know how to make it happen. Read more »
A database vendor called Objectivity has created a mobile app called GraphMyLife that aims to let consumers explore links between the people and content in their various social networks. I say “aims” because although the idea is pretty cool, the app is a bit laggy and confusing (at least on my phone). But cut Objectivity a break: it’s a specialized (and old) enterprise-tech company trying to humanize its graph database software.
A data science consultancy has published a report analyzing the design of retirement- and investment-industry websites, but the lessons are universal: Better design means better business. Read more »
Facebook, Ericsson, MediaTek, Nokia, Opera, Qualcomm and Samsung are launching an initiative called internet.org that aims to connect the whole world with internet access via cheaper devices, better business models and better infrastructure. Read more »
In a candid interview last week, Hortonworks CEO Rob Bearden discussed a variety of topics — including personnel, profitability and a public offering — in some detail. Hortonworks is a Hadoop startup that spun out of Yahoo in June 2011. Read more »
10gen has added some new features to its MongoDB connector for Hadoop, including support for Hive and the ability to backup MongoDB files in HDFS. Read more »
Business intelligence and analytics startup Birst has raised a $38 million Series E round led by Sequoia Capital. Birst has been very busy in the past couple years, moving from SaaS to on-prem software, rethinking the data warehouse and even launching a Hadoop-based service. It looks like Birst is positioned to test the IPO waters like Qliktech and Tableau before it.
Cleversafe, a Chicago-based provider of object-storage systems for housing massive amounts of data, has raised a $55 million series D round led by New Enterprise Associates. Apart from traditional storage workloads, Cleversafe has also made a name for itself as a replacement for HDFS in Hadoop environments. According to Crunchbase, the company has now raised $91.4 million since 2007.
A recent New York Times article casts some doubt on the economic impact of big data. Here’s why I think we haven’t seen anything yet when it comes to big data and the global economy. Read more »
Genomic-analysis startup Bina Technologies is trying to grow its footprint by giving away its appliances on a pay-per-use basis. It’s also expanding its capabilities to include analysis of exomes, a much smaller but very valuable component of human genes. Read more »
North Bridge Venture Partners’ Paul Santinelli offered up all sorts of opinions — many outspoken — on this week’s Structure Show podcast. Here are some of his thoughts on who can succeed in the cloud computing market. Read more »
Facebook has reportedly done away with its once-important EdgeRank system in lieu of a system that considers about 100,000 factors in determing what content to show on users’ feeds. Read more »
Google researchers have developed new methods for analyzing language using deep learning techniques. They’ve also open sourced an implementation of their work so any researchers can experiment with it. It could be the first of many deep learning tools designed for mass consumption. Read more »
When it comes to data, soccer is the new baseball. The latest issue of the Economist has an article breaking down English Premiere League soccer players using data, and a subsequent blog post includes an interactive tool from machine learning startup Ayasdi that lets readers explore the data. Earlier this week, Disney researchers presented their analysis of an entire year’s worth of ball-position data for a professional soccer league and how that can affect the outcome of games.
Todd Papaioannou is joining big data-focused venture capital firm Data Collective as a entrepreneur in residence. Papaioannou was most recently co-founder and CEO of Continuuity, and as has held executives roles at companies including Yahoo and Teradata. Read more »
Almost anything you want to know about how Netflix is scaling its streaming API to support a growing number of users. devices and geographies. No matter how many times I read (or write) about it, I’m still impressed by what Netflix is able to do using an entirely cloud-based infrastructure.
Facebook has detailed its extensive improvements to the open source Apache Giraph graph-processing platform. The project, which is built on top of Hadoop, can now process trillions of connections between people, places and things in minutes. Read more »
Data scientists are in high demand, which is bound to lure some of them out of academia and into industry. Their biggest challenge won’t be finding a job, but finding the right one — and maybe opting for entrepreneurship over employment. Read more »
A Chicago-based startup called AvantCredit has raised a $20 million series B round for its personal loan service that uses machine learning algorithms to assess credit-worthiness. AvantCredit closed a $34 million Series A round earlier this year. It’s taking a page out of the ZestFinance playbook — lending to underserved markets at rates less usurious than traditional payday-loan providers — although that company now acts only as an underwriter rather than an actual lender.
Looker, a Santa Cruz, Calif.-based business intelligence startup, has raised a $16 million series A round from Redpoint Ventures and First Round Capital. In an age of data tools targeting lay users, Looker is taking a different approach by trying to empower smart data analysts with its custom modeling language. The company closed a $1.7 million seed round in March.
Microsoft has developed a big data technology that sits on top of Hadoop’s new YARN resource manager. Called REEF, it’s designed to let users build jobs that can maintain state even after they’re done, and that can grab data from wherever they need it. Read more »
Zynga has open sourced a tool called zPerfmon that collects and serves all the performance data its engineers could ever need, and all from a single server. Read more »
Spend a few days hanging around Black Hat and DEF CON, and you’ll see some creepy hacks. If you wanna lose a little sleep, dwell on the fact it’s much easier to replicate the work or even to buy data-capturing devices. Read more »
It appears investors are buying into the adage that CMOs are the new CIOs. On Friday, a Portland-based startup called Lytics announced it has raised $2.2 million in seed funding from Rembrandt Venture Partners and Voyager Capital. The company compares its big-data-meets-marketing approach with that of Causata, which is good company to be in.
Rackspace grew its public cloud revenues 36 percent year over year, to $99 million. That’s steady growth, although hardly the meteoric growth its chief rival Amazon Web Services seems to be experiencing. Read more »