10 ways big data changes everything

20 Comments

Revolutionizing Web publishing with big data

By Derrick Harris

Building a system capable of revolutionizing analytics for some of the world’s biggest Web publishers doesn’t take a team of Ph.Ds. and thousands of servers. All it takes is a few smart people, cloud computing and a serious understanding of big data. Just ask Parse.ly.

The company, which officially launched in January and provides an SaaS application for drilling deep into publishing data, was doing some impressive things with a team of just eight employees as of early February, when I spoke with CEO Sachin Kamdar and CTO Andrew Montalenti. The result is a slick engine called Dash, used to see what content is driving traffic and to figure out what types of future content might catch fire.

Whereas some publishers have strict policies around tagging articles and some, like the New York Times, can hire data scientists to analyze and visualize traffic trends, many can’t or just don’t want to. Those are the customers Parse.ly is targeting.

Users can sort by authors, topics, sections, posts, trends and other metrics to get a real, historical understanding of their traffic beyond just seeing what posts or pages are hot at that moment. As Kamdar explained to me, users can see what posts do better across what topic pages, what pages perform better in what geographies, and what topics are trending and which have peaked, or they can find myriad other insights, using only a mouse.

They can also highlight trends using data (anonymous, of course) from the collective of Parse.ly users — which includes the Atlantic, the Next Web and U.S. News and World Report — to get a more comprehensive view of what is happening for specific topics. The best part: Parse.ly customers don’t have to do a thing to get this sort of granularity in their analytics.

Apart from the slick user experience, it is Parse.ly’s infrastructure that makes the service. CTO Montalenti told me Dash is hosted on the Amazon Web Services (s amzn) and Rackspace (s rax) cloud computing platforms and that it consists of a data aggregation layer and a processing layer. The processing layer analyzes the text of Web pages using Parse.ly’s homemade natural-language-processing system to classify authors, topics and other characteristics. The aggregation layer indexes content in near real time into predefined buckets so queries can be completed as fast as possible.

When I spoke with Kamdar and Montalenti in early February, they told me Parse.ly was processing about 700 page views per month for its customers and had crawled about 4 million unique URLs, representing years’ worth of content. But all of that content isn’t for Dash users’ eyes only.

Montalenti said Parse.ly also keeps long-term stores of publisher data to run batch analyses on later, using Amazon’s Elastic MapReduce service. This way, the team can spot long-term trends and patterns that might help improve Dash’s features or suggest new categories to add to the real-time index. In theory, he added, Parse.ly could also run custom analytics for its customers to spot patterns in their specific content that might help them figure out how to market certain content to certain users or determine the shelf life of certain topics.

In some senses, Parse.ly is the ultimate big data application in that it is both a consumer and provider of advanced analytics. Big data powers its product, but it also provides the capabilities necessary for Parse.ly to improve the product and expand its business. And for now, at least, the right techniques are letting Parse.ly do all of this with a team you could count on two hands.

20 Comments

Edwin Ritter

Reblogged this on Ritter's Ruminations & Ramblings and commented:
As 2012 reaches the half way mark, here is a quick view on how this hot topic. This is the first of three. Posts on the other trends will follow.

So ‘big data’ is a hot topic. What is it? Simply stated, everything you do on the web is tracked and creates data. So much data is collected that 90% of the online data was created in just the last two years. This data is stored, sliced, diced and analyzed. The growth in data is due to several things such as proliferation of smart phones and tablets, lower storage costs and improved analytical tools. This article reveals 10 ways in which big data will have an impact.

Steven Brown

Big Data is a tactical problem. Content and Business Analytics and Intelligence is the logistical problem. One must pay particular attention to Business Process Models, Entity-Relationship Models, and Data Modeling to be able to use ETL and Data Integration Technologies for developing your data storage organization, retrieval, formatting, clustering, Web Caching; and backup, recovery, and archival retention strategies.

Claude Bucher

idiots…
«“We want to unlock the black box of how an artist becomes a star,” White said»
what makes the charts is good music, not $$$$$ pumped into it 8-X
just like m$$$$$ can keep wasting billion$ on WP trying to make it a success, it won’t work. its crap, it doesn’t sell
period

yugun

why do we have to click through so many pages. can you at least provide a way to read it in a single page? (like businessinsider) there is not even a print option and it doesn’t work with readability. i thought more of gigaom. disappointed.

remedy2020@gmail.com

Advertorial ! Advertorial! so fast you sold your soul!

Jonas

Typos happen.
Gil’s “more thEn a gigabyte” is just plain ignorant.

Grammar Police

If you’re going to complain about a typo, make sure you don’t have any in your immature comment. When you have full mastery of the language, then you’ll be allowed to comment.

Elevatus

I don’t get it, what does Big Data have to do with a video card…or is this some lamesauce ad post?

ty

The emergence of this so-called big data phenomenon is also fundamentally changing everything from the way companies operate

Comments are closed.