The massive amount of data that is emerging from connected, digital systems, is fundamentally changing everything, from Internet search to entertainment, to disease management, to energy consumption. Here’s 10 case studies that highlight the power of big data.


Revolutionizing Web publishing with big data

By Derrick Harris

Building a system capable of revolutionizing analytics for some of the world’s biggest Web publishers doesn’t take a team of Ph.Ds. and thousands of servers. All it takes is a few smart people, cloud computing and a serious understanding of big data. Just ask Parse.ly.

The company, which officially launched in January and provides an SaaS application for drilling deep into publishing data, was doing some impressive things with a team of just eight employees as of early February, when I spoke with CEO Sachin Kamdar and CTO Andrew Montalenti. The result is a slick engine called Dash, used to see what content is driving traffic and to figure out what types of future content might catch fire.

Whereas some publishers have strict policies around tagging articles and some, like the New York Times, can hire data scientists to analyze and visualize traffic trends, many can’t or just don’t want to. Those are the customers Parse.ly is targeting.

Users can sort by authors, topics, sections, posts, trends and other metrics to get a real, historical understanding of their traffic beyond just seeing what posts or pages are hot at that moment. As Kamdar explained to me, users can see what posts do better across what topic pages, what pages perform better in what geographies, and what topics are trending and which have peaked, or they can find myriad other insights, using only a mouse.

They can also highlight trends using data (anonymous, of course) from the collective of Parse.ly users — which includes the Atlantic, the Next Web and U.S. News and World Report — to get a more comprehensive view of what is happening for specific topics. The best part: Parse.ly customers don’t have to do a thing to get this sort of granularity in their analytics.

Apart from the slick user experience, it is Parse.ly’s infrastructure that makes the service. CTO Montalenti told me Dash is hosted on the Amazon Web Services and Rackspace cloud computing platforms and that it consists of a data aggregation layer and a processing layer. The processing layer analyzes the text of Web pages using Parse.ly’s homemade natural-language-processing system to classify authors, topics and other characteristics. The aggregation layer indexes content in near real time into predefined buckets so queries can be completed as fast as possible.

When I spoke with Kamdar and Montalenti in early February, they told me Parse.ly was processing about 700 page views per month for its customers and had crawled about 4 million unique URLs, representing years’ worth of content. But all of that content isn’t for Dash users’ eyes only.

Montalenti said Parse.ly also keeps long-term stores of publisher data to run batch analyses on later, using Amazon’s Elastic MapReduce service. This way, the team can spot long-term trends and patterns that might help improve Dash’s features or suggest new categories to add to the real-time index. In theory, he added, Parse.ly could also run custom analytics for its customers to spot patterns in their specific content that might help them figure out how to market certain content to certain users or determine the shelf life of certain topics.

In some senses, Parse.ly is the ultimate big data application in that it is both a consumer and provider of advanced analytics. Big data powers its product, but it also provides the capabilities necessary for Parse.ly to improve the product and expand its business. And for now, at least, the right techniques are letting Parse.ly do all of this with a team you could count on two hands.

You’re subscribed! If you like, you can update your settings

firstpage of 11
  1. Reblogged this on Dots Of Color and commented:
    Big data big money!

    1. I don’t get it, what does Big Data have to do with a video card…or is this some lamesauce ad post?

    2. The emergence of this so-called big data phenomenon is also fundamentally changing everything from the way companies operate

      1. Yes, it does. Who controls the most data wins. At least Facebook would like to think so. ;-)

  2. Is gigabytes bytes more then a gigabyte?

    1. Katie Fehrenbacher gil Monday, March 12, 2012

      nope just a typo, fixed that, thanks!

      1. Typos happen.
        Gil’s “more thEn a gigabyte” is just plain ignorant.

    2. Grammar Police gil Thursday, March 15, 2012

      If you’re going to complain about a typo, make sure you don’t have any in your immature comment. When you have full mastery of the language, then you’ll be allowed to comment.

  3. infotech ideas Monday, March 12, 2012

    Great info! Bring the expo to SFO as well!

  4. remedy2020@gmail.com Monday, March 12, 2012

    Advertorial ! Advertorial! so fast you sold your soul!

  5. DataStax, more specifically Cassandra, can solve all big data problems.
    And its open sourced.

  6. SAP HANA to the rescue!

  7. Reblogged this on <i>cu Lì!</i> and commented:
    great info :)

  8. why do we have to click through so many pages. can you at least provide a way to read it in a single page? (like businessinsider) there is not even a print option and it doesn’t work with readability. i thought more of gigaom. disappointed.

  9. idiots…
    «“We want to unlock the black box of how an artist becomes a star,” White said»
    what makes the charts is good music, not $$$$$ pumped into it 8-X
    just like m$$$$$ can keep wasting billion$ on WP trying to make it a success, it won’t work. its crap, it doesn’t sell

  10. Steven Brown Monday, March 19, 2012

    Big Data is a tactical problem. Content and Business Analytics and Intelligence is the logistical problem. One must pay particular attention to Business Process Models, Entity-Relationship Models, and Data Modeling to be able to use ETL and Data Integration Technologies for developing your data storage organization, retrieval, formatting, clustering, Web Caching; and backup, recovery, and archival retention strategies.

  11. Edwin Ritter Wednesday, May 9, 2012

    Reblogged this on Ritter's Ruminations & Ramblings and commented:
    As 2012 reaches the half way mark, here is a quick view on how this hot topic. This is the first of three. Posts on the other trends will follow.

    So ‘big data’ is a hot topic. What is it? Simply stated, everything you do on the web is tracked and creates data. So much data is collected that 90% of the online data was created in just the last two years. This data is stored, sliced, diced and analyzed. The growth in data is due to several things such as proliferation of smart phones and tablets, lower storage costs and improved analytical tools. This article reveals 10 ways in which big data will have an impact.

  12. Mission Impossible Wednesday, May 30, 2012

    Thank you for the thoughts on this so far.
    The kate broadwell

  13. Mission Impossible Thursday, May 31, 2012

    Thanks instead of the article. Blogging is replacing main onslaught news for various black people.
    The vpn port

Comments have been disabled for this post