RSS, Tiger Safari and the Bandwidth Bottleneck

42 Comments

In less than 48 hours many of us will be installing Tiger OS-X and with it a brand new Safari browser that can read and display RSS feeds in a simple easy to understand manner. That upgrade while great for the consumers, could come as a big shocker for those blogs whose feeds are included as part of Safari’s default starter package. Infact it could be the biggest stress test for RSS thus far!

Most RSS readers are set to poll for updates every hour, and imagine when half-a-million Tiger Safari users who start hitting a server at the same time, pulling down RSS updates, because they have not changed the default settings. Server meltdown? Or an unintended denial of service? Apple says that most of the default feeds are going to be major news sites like CNN. New York Times, and LA Times. At this time they are not including any personal blogs as part of the default list. Even for them it is not going to be easy.

Lets say if one of these news operations updated their site once an hour and each update results in a nominal 5 kilobytes of RSS generated data, then 500,000 simultaneous Safari users polling at top of the hour would mean a total data transfer of over 2 gigabytes per hour. Times 24, and you have over 48 gigabytes of data transfer every day – just from Safari users alone. What if more than a million Tiger Safaris were on the loose. Oh boy! While an addition 48 gigabytes of traffic a day or 1.4 terabyte a month is not that much for large sites, but it will add up.

Admittedly, since I don’t have Tiger yet, not sure if Safari RSS does time-based check (every hour at :15) or checks related to when the computer/browser is started, which is relatively random and what other feed readers do. Clearly this is an imaginary scenario, but it could happen. So what’s the fix? “I certainly hope that Safari does conditional GET. I can’t imagine it doesn’t but I could be wrong,” says Brent Simmons, founder and the man behind hit feed reader, Net News Wire, “With conditional GET you download the feed only if it’s different from the last time you downloaded it — this cuts way down on bandwidth use.” (More on conditional GET.) “Conditional GET — which NetNewsWire and most other aggregators support — is hugely important,” says Simmons. But even that can go that far, since most of these news operations churn out headlines with monotonous regularity.

Long term, I think RSS is going to become a clear bandwidth hog, unless the RSS people decide and come-up with an intelligent way to fix this problem. I have been tracking my own bandwidth consumption and RSS is just sucking up gigabytes like a parched man on a hot summer day. Some say that randomizing the whole RSS polling process is the answer.

How about randomizing the whole RSS polling process? Instead of pulling down RSS feeds every hour, let the feeds download randomly. Okay that will help distribute the loads on the servers more evenly, but that still doesn’t resolve the issue of inefficient use of network resources, especially for those who pay for those kind of things. Suggestions?

Scott Rafer, CEO of Feedster says, “ISPs can start caching feed URLs but if they do it with cached times of more than 10 min, then people will route around the caches.”

42 Comments

Phil Boardman

Except if a feed hasn’t changed it will send the appropriate HTTP Response code and you’d only send a few bytes.

Adam Zey

The RSS bandwidth problem is totally overblown. The problem is people just don’t know how to deal with it. First off, nobody seems to compress their RSS feeds with gzip/deflate, which would save about 80% of the bandwidth usage. It’s part of the HTTP spec.

Second is people don’t know how to get cheap bandwidth. Corporations claim that RSS is costing them enormous sums of money, when amusingly their RSS bandwidth needs could be met with a simple budget dedicated server for $100 to $200 per month; mere pennies for big companies. And if the big servers ($100 gets you abouut 1000GB, $200 gets you 20mbit, or over 6000GB) are too expensive, there are cheaper solutions like low-end servers or virtual private servers that go for under $50/mth.

What does $100/mth get you? Why, only about 200 MILLION RSS downloads per month, more than enough for big corporations (And hey, if they need more, $200 per month is still not breaking the bank for a big corporation).

$100 isn’t too much for bloggers either; since they don’t need nearly that much RSS capability, a bunch of bloggers could share the cost.

Anyhow, that’s my rant. There is no problem with RSS, the problem is with the people using it.

Aswath Rao

When FWD RTC Client included my blog to their ISS reader (with 5 minute period), I had to increase my usage. Those who know my blog would set the period to be 24 hours, not even an hour. So the reader should allow for customization at different levels.

Conditional GET solves the problem at one level. But the server gets pinged nonetheless and if randomization is not done, then as you suggested that all these will bunch up together.

Sina

One million users of the new MAC OS? I love Apple owners’ optimism.

Om Malik

well, i agree. when the article was posted, there was little information about who was going to be included in the list. i think i was most worried about folks like gizmodo and engadget and some of the bigger blogs, but not so big websites.

najeeb

granted – I should have read you article correctly. But, I’d assume that the servers which are hit with the default settings would handle such load.

Om Malik

najeeb you are focusing on the bandwidth usage in terms of napster. when that was happening the load was generally distributed across many networks, over many servers. in this case, the rss feeds are being polled from one server, at aregular interval and are going to pose a strain on that server.

najeeb

I think, it’s a complete non-issue. RSS feeds are very small in size, and any intelligent client shouldn’t be pulling in the updates every second. It totally depends on the implementation of Safari browser. In any case, in terms of bandwidth usage, I don’t think it is going to come anywehre close to the peak usage of Napster when hundreds of thousands of people were exchanging mp3 files day and night.

orque

It seems people are misinformed. Yes, Safari RSS checks feeds periodically (specified in the preferences in Tiger’s Safari), using conditional GET. Yes, Safari RSS shows unread counts (in parentheses after the bookmark name, and by coloring article headers if you have that option turned on). And yes, it aggregates. You just put related feeds in the same folder then you choose “View All RSS Articles”.

Om Malik

okay here is the deal silver dragon – if you take a billion drops of water, it become a tank and so on and so forth. get it. because it might seem small it all starts to add up real quick when a lot of of people are accessing the same small file and that small file is traveling to a million pages, the server serving it up gets hit hard and real fast

WCleve

OM,

It’s not that the network guys don’t talk to the app guys, it’s the other way around. I’ve been a network guy for nearly two decades and I have yet to have anyone contemplating an application that could exhaust bandwidth contact me to see if their hairbrained idea will work.

JT

Using the conditional GET you return only new articles. You’d also cap the maximum number of articles to return also. Then return a 304 if nothing is new. It would then be up to the aggregators to cache back articles. Of course conditional GET was designed for this purpose but it’s what we have to work with!

Case closed, I’m hungry… what’s there to eat? :)

Or implementing a caching type service for RSS feeds is pretty trivial. The problem is how is it paid for? (And don’t tell me ads wouldn’t begin to appear in feeds!)

Om Malik

you are clearly missing the point. when you susbcribe to afeed in my yahoo, it is cached on a yahoo server. millions of people who are also subscribing to that feed get a copy of that cache. however, in case of apple, the caching doesn’t happen. it is from the server to the broswer.

Steve

How could the release of Safari possibly cause more bandwidth problem than the inclusion of RSS in My Yahoo!. As much as I’m a devoted Apple fan, their base simply isn’t big enough to move the needle. Supporting RSS in Safari may be innovative for browsers, but it sure isn’t news.

Mike D.

pb: Ok, we’ll see, but I highly doubt any significant percentage of the Mac population, let alone the world, will be turning to Safari for RSS-reading anytime soon. Just look at how many people use Firefox’s Live Bookmarks and divide by 1000.

That said, however, I do expect Apple to make some improvements in this regard. How quickly these changes come and how much they really improve things is still up in the air though.

Also, viewing entire individual feeds one at a time is not the most popular way to consume RSS to my knowledge. Viewing unread items in aggregators (client-side or web-based) is. Without the concept of unread item aggregation, RSS is no more than an unstyled, unimaged web page.

pb

Mike D, seen it and totally disagree. That’s all the majority are looking for is a simple way to read feeds. The majority don’t care about RSS esoterics like “notification” (does that refer to read/unread indicators?). Viewing feeds one at a time is already the most popular way to consume RSS.

It’s been shown time and time again that the dumbed down versions get the most usage.

CharlesV

What about stuff like Shrook’s distributed checking system? It offloads the server-load to a central server that just says when things are updated, rather than pulling new data. Users that are subscribed to a feed communicate with the server, and other users subscribed to the feed are updated accordingly. Users get more frequent updates, and everybody wins.

Mike D.

pb: I’m not sure you’re understanding exactly how braindead the RSS capabilities are. There’s no concept of notification at all. It’s not even really an RSS aggregator… it’s viewer of single RSS feeds. Sorry but no one is going to use that. Trust me… try it out… you’ll see what I mean.

garam

I am not as conversant with RSS as some here seem to be but I would imagine that one of the solutions would be Pay-per-Feed! Somebody will have to pay ‘coz the likes of CNN, NYT, LAT, etc. aren’t putting out infromation for charity.

ev spendael

Non issue. Complete non-issue.

How much dark fiber is there?

How fast has the price of bandwidth declined?

What is its current rate of decline?

RIGHT — can we move on? All this RSS-is-a-bandwidth-hog chatter is really pointless. Leads to idiotic speculation like “RSS was invented by ISPs … ” etc.

The development community got conditional-GET religion a long time ago.

It’s a trivial thing.

If you don’t like polling as a means, that’s fine. But it happens to work, quite well, and if you don’t understand the full RSS spec then you ought to go read it before you make or repeat falsehoods about it.

It’s up to the client authors to implement the spec.

If they don’t, the clients should get banned, period, end of story.

NNTP? Please. If it hadn’t failed, things like RSS wouldn’t exist.

NEXT.

pb

Polling is good for somethings. Push for others. Polling is fine for RSS. Clients just need to do a better job of figuring out when to poll. Polling every hour when the user isn’t even at his/her PC is dumb. I have FeedDemon set up so it only polls when I tell it to which works fine. I don’t see why auto-polling frequently is necessary.

The best push mechanism, by the way, is SMTP. And even that includes polling the email account.

laird popkin

The real issue isn’t Safari — that’s just RSS getting more popular — the real issue is RSS’ inefficiency (because it’s based on polling). Ideally, syndication of news should be delivered as “push” updates when there’s new information to deliver. This was solved by the Information & Content Exchange (ICE) protocol years ago — see http://www.icestandard.org . Admittedly push delivery only works for people with routable IP addresses, but with broadband penetration over 40% (and with syndication between web sites) it could go a long way towards saving bandwidth…

pb

First, does anyone know how Safari RSS actually works?

The whole idea of time-based updating is kind of silly. I wouldn’t be surprised if it only updates feeds when the user navigates to the feed reading area.

“I don’t expect many people to actually use Tiger’s built-in feed reading functions until they are brought into parity with the leading feed readers.”

Uh…I don’t think so. Even if Safari’s feed reading is super-basic, it will gain a large number of users.

Mike D.

Coupla things:

1. Although the RSS capabilities in Tiger/Safari 2.0 look nice from a cosmetic standpoint, they are pretty braindead. There is no useful concept of notification and “feedreading” doesn’t look anything like it does in almost every other reader (Bloglines, NNW, FeedDemon, etc). I don’t expect many people to actually use Tiger’s built-in feed reading functions until they are brought into parity with the leading feed readers. It’s really not even close right now… think Firefox’s Live Bookmarks but even less “live”.

2. I personally like the server-side solution to solve the bandwidth problems created by RSS. Bloglines pulls one feed on behalf of thousands and thousands of readers… that’s efficient. I’d pay several dollars a month for Bloglines if I had to or alternatively I’d put up with a few ads in order to pay for that bandwidth.

Jon Gales

Om: As long as Safari can use not modified headers, it will only pull the feed down when it has been updated. So it should be no worry.

Charlie Sierra

RSS is a scam invented by the web-hosting companies to drive customers to higher rate plans, or drain their pockets with “overage” charges.

These guys must’ve studied the cellular industry.

PS. this is all slightly tic.

J. Daniel Smith

I think the “real” fix is to figure out a way to somehow merge NNTP and RSS. NNTP spreads the load out among several severs, you connect to whichever one is “closest” to you.

There are some very small steps in that direction, but far too little.

All of the other solutions are patchwork and don’t really scale well; although they can bring significant improvements.

Om

Yup, in many ways its the same thing all over again. I think the network guys don’t talk to the app guys and that is always going to be a major problem in the future. this is going to only escalate. as more RSS feeds come online, well more clutter.

Venky

People have short memory. Pointcast used to clog servers and bandwidth in enterprise data-centers and ISP’s.
RSS is an open pointcast standard, and whatever algorithms people use the old adage still hold goods , download expands to fill bandwidth..

Comments are closed.