New leaks reveal how the government is using a massive secret program to break into personal, business and financial communications long considered secure. Here’s the most important takeaways. Read more »
Location-data startup Placed has been tracking the businesses that consumers visit for about a year, and now it’s tying that data to their TV habits and interests. Where do “The Biggest Loser” viewers hang out? Bakeries. Read more »
Big data startup HStreaming is now part of Swiss advertising firm Adello Group. HStreaming had standout technology by all accounts, but the business never scaled enough to survive in a tough market. Read more »
Curious about how much of your personal financial and other data is collected by data brokers? Check out Acxiom’s Aboutthedata.com to get a glimpse. Read more »
This post from Slate is spot on, in my humble opinion. It might be overkill, but I can say the same about my own posting habits, and did last year. (I can’t say the same about my wife, though …) There are plenty of reasons to not want a digital profile you didn’t ask for, and advances in behavioral analysis and facial recognition are only making them worse.
Marathon is a new framework that turns Mesos — a favorite of Twitter — into a more dynamic tool for running different applications on a single set of machines. Marathon comes from a startup called Mesosphere, founded by two former Airbnb engineers who know Mesos cold. Read more »
Journalist and Anonymous spokesman Barrett Brown is accused of trafficking in stolen credit-card numbers and could face years in prison for posting a link in an Internet Relay Chat channel aimed at crowdsourcing information about defense contractors. Read more »
Will VMware be able to replicate the monster success it achieved inside corporate server rooms in the cloud at large? Maybe, but it will have to execute well on many fronts and catch a lot of breaks for that to happen. Read more »
The Stockholm outfit wants to save developers the hassle of juggling local development environments, framework installs, deployment services and hosting. Read more »
Topsy, a social search engine, said Wednesday that its searchable archive now includes every tweet since Twitter first launched in 2006 or close to half a trillion pieces of social content. Read more »
SwiftKey, a London-based startup that sells a popular “smart” keyboard for Android devices, has closed a $17.5 million series B led by Index Ventures. The company plans to spend the money on research to “fuel further innovation in the fields of Natural Language Processing and Machine Learning,” among other things, according to a press release. That’s probably not a bad idea given Google’s vested interest keyboard dominance and focus on cutting-edge text analysis.
Traffic management could be key to the future of the open internet, so what proportion of UK consumers takes such policies into account when choosing a broadband deal? A whopping one percent. Read more »
The internet of things is a new world for technologists and consumers, but it also represents an opportunity to change some of the things we got wrong about the web when it comes to trust and privacy. Read more »
Twitter has open sourced a “streaming MapReduce” system called Summingbird that makes Hadoop and Storm play nicer together so applications that require both batch and stream processing can do their jobs with as little complexity as possible. Read more »
Wearables are already deeply embedded in the sports world, so in this week’s podcast I talk to an SAP expert about what we can learn about data derived from wearables from the NFL and the NBA. Read more »
Microsoft will join Google and Facebook — and show its commitment to Finland — by siting a data center in Europe’s frozen north. Also, the Nokia phone unit takeover talks have been on since February. Read more »
The New York Times continues the surveillance theme with a scoop about a project called Hemisphere, which involves the collection and long-term retention of phone metadata by AT&T in order to aid local and federal anti-drug law enforcement efforts. The length of the retention time (as much as 26 years) far outstrips anything the NSA is doing. It strikes me as notable that the biggest mass surveillance operations are being carried out in the name of unwinnable, unending wars, namely those on terror and drugs.
Thousands of enterprise customers use Splunk to help solve challenging big data problems across their infrastructure and beyond. Read this analyst report and discover how Cars.com, a leading website for vehicle shopping, used Splunk to find new revenue and cost containment opportunities within its machine-generated data. Read more »
Facebook is hosting a Kaggle competition in order to identify candidate for a data scientist position. Résumés are so passé when you can just have applicants prove their skills first. Read more »
Around 200,000 volunteered computers donated 17,000 years worth of computing time in an 8 month span, aiding in the identification of 24 pulsars in the Milky Way. Read more »
Researchers have released a tool that lets anyone track the whereabouts of Twitter and Instagram users who allow geotagging of their posts. They want social media users to be aware that geotagging exists and what kind of information it provides. Read more »
Many names have popped up in the long-running scandal, so we thought it would be a good idea to bring them together in one handy resource. Read more »
Foursquare is reportedly looking for investment from a large technology company, and the most obvious fit is Google — because features like its real-time recommendations would fit perfectly with Google Now. Read more »
I’d argue this is a prime example of when metadata is used correctly. If the other nearly 150,000 phone numbers were never investigated and the records were deleted once the feds found their guys, any invasion of privacy is only theoretical. There’s a big difference between this and GPS-tracking, or what the NSA is doing.
As Guardian data journalist Stijn Debrouwere points out, many media companies have an obsession with measuring things, without understanding what is important and therefore worth measuring. Read more »
A London-based startup called import.io has built a service that lets users take information from websites and turn it into structured data that can populate a spreadsheet or feed an application via API. And it doesn’t require any coding. Read more »
Fruux, the open-standards back-end tool for synchronizing contacts, calendar and to-do-list data between various clients and platforms, is now rolling out front-end web apps to make it easier for users to manage that data. Read more »
TechStars Chicago kicked off its first demo day, highlight ten companies from across the tech spectrum. Online dating startups shared the stage with data analytics companies and gadget makers. Read more »
Hortonworks has released a set of icons for illustrating the roles of various Hadoop-ecosystem components in flow charts and other architectural diagrams. Earth-shattering? No. Helpful if you’re stuck trying to build a PowerPoint slide about your big data environment? Probably. Read more »
LinkedIn’s new University Pages are a case study in how to build a big data application. Ideas are great and pretty web design are great, but you also need people who can find and format the data, the the systems in place to make everything work. Read more »
OnApp and Flexiant both want to help legacy hosts and telcos find their niche in the cloud, but they offer significantly different paths. The two firms will debate their strategies at Structure:Europe next month. Read more »
Couchbase, a startup selling a NoSQL database of the same name, has raised a $25 million series D round. Adams Street Partners led the round and was joined by existing investors Accel Partners, Mayfield Fund, North Bridge Venture Partners and Ignition Partners. Couchbase doesn’t have the huge user base of MongoDB or the edginess of HBase, but it does have some big-name users (including Orbitz) and the company claims sales jumped 400 percent in the last year.
Rapidus is reporting that Apple has acquired AlgoTrim, a Swedish data compression startup. This could potentially help it reduce iOS data usage and improve camera quality. Read more »
How much does the U.S. government request data from U.S. web properties? A lot. Here are eights charts showing data from Facebook, Google, Microsoft and Twitter about how many requests they get from across the globe. Read more »
MongoDB creator 10gen has changed its name to MongoDB, Inc. It’s probably not a bad idea to align the company’s name with the its sole product, but it will take a little getting used to. Read more »
Dr Alexander Dix is Berlin’s privacy chief. With Germany being pretty hardcore about data protection law, you might think he’s Silicon Valley’s worst enemy — but he has compromise in mind. Read more »
Violin Memory has filed for a $173 million initial public offering, although it did so without much of the hype traditionally associated with Violin news. The company is on pace for $100 million in revenue this year, but it’s now part of a crowded flash market. Read more »
Hadoop-based analytics startup Tresata last week open sourced a set of machine learning libraries built on Scalding and designed to run in Hadoop and make use of the Apache Mahout project. Tresata is calling the project Ganita, and has also written a couple of explanatory blog posts about it, including how to do k-means clustering. The barriers to doing good work on big data just keep getting lower.
Publishing analytics startup Parse.ly has raised $5 million and has released its first report showing the top sources of traffic across its customer base. It claims hundreds of them, including big-name ones like Atlantic Media, Reuters and Mashable. Read more »
Based on the data scientists I’ve met and the “how to become a data scientist” talks I’ve seen, it’s hard to disagree. But SQL and coding skills can be really helpful if you need need to get stuff done beyond pure statistical analysis.