2 Comments

Summary:

If you’ve ever wanted to see who follows you on Twitter, where they live and what they do, but don’t have a clue how to utilize the Twitter API, it’s your lucky day.

twitter-location-1(2)

Twitter is a great service, but it’s not exactly easy for users without programming skills to access their account data, much less do anything with it. Until now.

There already are services that will let you download reports about when you tweet and which of your tweets were the most popular, some — like SimplyMeasured and FollowerWonk — will even summarize data about your followers. If you’re willing to wait hours to days (Twitter’s API rate limits are just that — limiting) and play around with open source software, NodeXL will help you build your own social graph. (I started and gave up after realizing how long it would take if you have more than a handful of followers.) But you never really see the raw data, so you have to trust the services and you have to hope they present the information you want to see.

Then, last week, someone from ScraperWiki tweeted at me, noting that service can now gather raw data about users’ accounts. (I’ve used the service before to measure tweet activity.) I was intrigued. But I didn’t want to just see the data in a table, I wanted to do something more with it. Here’s what I did.

Step 1: ScraperWiki

This literally could not be easier. Get a ScraperWiki account, choose the “Create a new dataset” and select “Get Twitter followers.” At that point, enter your Twitter handle (or the handle of whatever Twitter user you choose) and hit “enter.” Depending on how many followers a person has, it could take minutes to many, many hours because of rate limits. I had just over 7,000 followers at the time and was done in minutes.

Once the process is complete, you actually search and sort a lot within ScraperWiki itself. In the table view, you can search by name, number of followers, location or even user ID (Om has by far the most followers among my followers, and appears to be Twitter user No. 989).

scraper screen

But unless you can write code or SQL queries, there’s not a whole lot more you can do, especially with Twitter data. So you’ll want to download the data as a spreadsheet.

Step 2: Clean the data

This one is kind of a pain, although I’m not sure it’s always necessary. I wanted to visualize my followers by where they live, so I felt I thought it was a good idea to standardize on a common value for the “location” column in the spreadsheet. Otherwise, you’d end up with, for example, a bunch of followers in “San Francisco,” some in “San Francisco, California,” others in “San Francisco, CA,” a surprising number in “SFO” — you get the point.

Excel

I opted for the postal service version for U.S. cities (e.g., San Francisco, CA) and City, Province, Country for Canadian cities, and City, Country for other international cities. Then there are the various ways that people enter their location by general geographic area. I standardized on San Francisco Bay Area, Bay Area and Silicon Valley, for example, when followers listed some variation of those as their locations, but I’ve since realized I should have combined them into one mega-value (probably either San Francisco Bay Area or Silicon Valley). NorCal, SoCal and any variations thereof became Northern California and Southern California.

Depending on what you want to do and what tools you’re using to do visualizations, you might need to go a step further or not go through this step at all. Tableau Public, for example, seemed to want separate columns for city and state when I tried to map my followers’ locations (latitude and longitude might have worked, as well). I didn’t know any slick Excel tricks for doing this in a hurry, so I just decided to use Google Fusion Tables, which automatically geocodes location data (after you manually label the column as containing location data).

Step 3: Pick a tool and visualize away

Once you have the data cleaned sufficiently (if at all), it’s time to visualize. The options are fairly limited if you don’t want to write code, but they’re still pretty powerful (and able to handle visualizations spanning thousands of rows, which was key). I went with the tried-and-true Tableau Public, IBM Many Eyes and Google Fusion Tables (although Datahero and Infogram could work, as well, with fewer followers). They all work a little differently, but it’s nothing that most people couldn’t figure out within a few minutes of experimentation.

Here’s what I came up with.

From Tableau

Twitter location 1

Had I originally merged San Francisco Bay Area, Bay Area and Silicon Valley, that would have made for a large bubble.

And this:

location 2

From Google Fusion Tables

followers global

Zoomed on North America:

followersNA

From IBM ManyEyes

WIth ManyEyes, I used the text from my followers’ descriptions rather than the location field. Here’s an unfiltered word cloud.

followersprofile

This time by two-word combination:

followers2word

This is all far from big data or data science or any other tech-industry buzzwords, but it is good, clean fun with data. And, of course, because you have the raw data, there’s really no limit on how you can chart it or what metrics you can analyze against each other.

  1. Cool idea! Which of the data analysis tools work with a Mac?

    Share
  2. The potential for this is quite phenomenal. What are you going to do with the data when you have got it? Are you going to concentrate on specific geographical areas and put twitter content in specifically for them?

    Share

Comments have been disabled for this post