3 Comments

Summary:

I analyzed more than 5,000 posts by Gigaom writers in 2013 to identify the words and phrases we use the most. Can you guess what they are? Some of them might surprise you.

frequencies

We at Gigaom write a lot about data and how it’s changing the world. But we don’t always look at a lot of data about ourselves, at least beyond which posts are performing well at any given time. We certainly don’t publish a lot of data about how we operate.

However, I thought I would change that and try to figure out — using free web services, of course — which topics we write about the most, and what types of headlines we write, and then visualize them. I share some thoughts on why some words are popular (and even dug into our coverage of Google a little deeper), but overall I think the results provide a good overview of what companies, technologies and issues mattered in 2013.

Here’s the process I used:

  • I worked with the team at import.io (check out our coverage of it here) to build a crawler and grabbed the author, headline and excerpt from about 6,800 posts published in 2013. Of those, I opted to use the 5,225 actually written by the current Gigaom editorial staff.
  • Next, I used a tool called Textalyser (it’s old, and I’m not sure if it’s still being updated) to count the 50 words used most often used in headlines by each writer and by the editorial team as a whole. It gave measurements for total occurrences, as well as the frequency of each word. I ended up not using the excerpt data because it looked very similar to the headline data, but with more ands.
  • Finally, I visualized the data using Tableau Public and RAW, the free and super-simple service we first covered in October.

Here’s what I found. (Click on any of the images for larger versions.)

We like new things, Google and data

We’re a news organization by and large, so it’s not surprising that “new” was the most-used word among the sample data. I guess the fact that data was the third most-used word shouldn’t be surprising either. After all, the collection and analysis of data is underpinning many of the tech trends we cover — from the internet of things to NSA spying — and we think it’s important enough to warrant its own conference (Structure Data, March 19-20 in New York) and often make co-starring appearances at the others.

hedstaff50

Google as the second most-popular word surprised me a bit. Apple was No. 17 and Facebook was No. 26. — and they didn’t combine to make it into as many headlines as Google did. Some would say that’s a sign that Google is the most-important technology company of the past year, others would say we’re obsessed with Google. I dive into that a little bit below.

We’re also heavy users of explanatory headlines, which is why “how,” “why”, “you” and “your” are so popular. As the pace of innovation keeps advancing and technology markets become so much more complex, it’s generally good to click on an article expecting to learn how something works or why it matters rather than that it happened and it might or might not be relevant to you.

About those Google posts

I couldn’t resist seeing whether we write about Google so much because it’s so newsworthy or because we just like it a lot. According to a sentiment classifier on etcML (which, was developed by a group of Stanford machine learning researchers and students and, I should note, is probably not anywhere near 100 percent accurate), we do write slightly more positive stories about Google than negative ones, but most are neutral. And the positive-negative-neutral ratio for Google posts (94 percent neutral, 4 percent positive and 2 percent negative) about aligns with our ratio for all posts (91 percent, 6 percent and 3 percent).

This chart breaks down our headlines including the word “Google” by where they fall on a scale from negative to positive. As noted above, most are in the neutral middle.

google sentiment

Breaking it down, writer by writer

But who’s the most Gigaom-y writer, you ask, and did some writers skew these results by using words much more frequently than others? The answer to the first question appears to be David Meyer, at least in terms of having the most overlap with the overall top words. The answer to the second appears to be not really.

The chart below shows which of the top 20 overall words (represented by the “Staff” column) made into each writer’s top 50 words. The darker the square, the more times a writer used that word. A white square indicates it didn’t make that writer’s top 50 words. Of the top 20 words used overall, some — “new,” “Google” and “how” — made every writer’s top 50. And although I used “data” more than anybody else, I certainly wasn’t the only one using it.

This chart, sorted by frequency of appearance in headlines rather than total number of times used, does a better job a highlighting each writer’s tendencies. Again, darker equals more often.

frequencies

Finally, for anyone curious how all the writers look individually, the gallery below includes bubble charts showing each Gigaom writer’s top 50 headline words (again, click to see a larger version). Anyone who reads Gigaom regularly probably won’t be surprised, but it’s still interesting to see how writers’ beats and interests really do manifest themselves in headlines. Data, Chromecast, the internet of things, Twitter, Bitcoin — all present and accounted for by the people who know them best.

 

You’re subscribed! If you like, you can update your settings

  1. Yup, I stop by almost every day but after a year of visiting I consider GigaOm to be a pro-Google website. Other than that percieved bias on my part the design and content of GigaOm is great. Sorry for the jaundiced eye.

  2. Margaux Dela Cruz Wednesday, January 22, 2014

    Being “google” in the 2nd place… is it because it launched the google penguin and updates in google panda and that you need to cover the updates? interestingly, I didn’t see Yahoo or Samsung, both names also became hitmakers in 2013.

  3. This is interesting, but also a very good example of bad data visualization. If you make the data available in Github I can whip something together to show you how this data could be visualized in a much clearer and more useful way.

Comments have been disabled for this post