Why Polling Sucks
The current way that people get updates to feeds today on the web is polling.
That is, the subscriber sits in a loop and asks repeatedly,
"Anything new yet?"
"Anything new yet?"
"Anything new yet?"
The server (if it's smart enough to) has to check and reply,
"No, you have the most recent version."
"No, you have the most recent version."
"No, you have the most recent version."
This sucks. It's a waste of resources, and it's impossible to schedule the polling correctly, even with adaptive rate control. The big knob you have is balancing between:
- Poll fast, i.e. every minute. This wastes server resources on both sides, and a minute isn't even that fast.
- Poll slow, i.e. every few minutes or hour or day, depending on how active that feed is. This is nice to servers, but when the feed is updated, it'll take on average minutes, hours, or days to see it. This is way too slow.
PubSubHubbub, like many other protocols, is a "push" protocol. The publisher pushes to the hub, and the hub multiplexes the push out to all the subscribers. If the publisher and hub are performing well, the subscriber could get notified within seconds (and definitely under a minute), regardless of how often the feed is updated, and using only minimal resources for all parties involved.
So that's what this is all about.
What's wrong with feed updates every few minutes/hours/days?
It might be fine for reading your blogs (it's obviously been fine for most people for years now), but it's not fast enough for the decentralized applications of tomorrow that people want to build.
We and others are thinking about totally decentralized social networking, with real-time updates. Don't assume any of this is about blogs just because we're using the Atom format and Atom's typically used for blog posts. Atom feeds can be used to represent any topics, including private person-to-person topics such as direct messages or even real-time moves in a game.
Is 'PubSubHubbub' the final name which will be used for the protocol?
I certainly hope so, the name's cute and meaningful, a rare combination. :)
The name is not bad at all and counts for nice jokes with people who hear it for a very first time!
I find this discussion to be quite single minded. Certainly, if notification delivery duration is what you care about, your argument has merit. Still there is one big plus to polling: Loose coupling and no state on the server side, i.e. neither the server has to know about the client nor the client depends on the server to know about him. This is the prime reason why RSS is working so nicely for many usage scenarios. Subscription introduces a large amount of tight coupling which is not easily avoided while the extra cost of polling can be substantially reduces by plain HTTP caching
what about the clientside? does the "reader" not have to be able to receive these pushes too? wouldn´t it otherwise just keep on polling and receive the hubbed notifications only on occurrence of the next poll?
How about using this protocol to notify Google about site updates? Spidering is a bit backwards when you think about it, a remnant from the days when search engines were not as important as they are now. Websites should be able to submit content to Google as soon as they have it, instead of waiting to be crawled.
One case for polling are subscribers that can't be connected to. i.e. Browsers or subscribers behind a firewall. I'm not saying this whole approach isn't cool, just that there is a need for polling as well.
nice to see people finally implementing what we read in Will RSS Readers Clog the Web? (Wired, 04.04) I feel quite vindicated now, thank you :)
I hate polling too, but polling can also make sense if there is statistically a high likelihood of getting more data.
For example, on Twitter, when following a lot of people, updates are often successful. On the other hand, look at how many times Twitter has gone down and all the problems they've had because of their architecture. Conversely, push-based chat services like IRC have worked reliably and scaled well for decades now.
is very cool.
PuSH is very cool for the web. Lets make the whole web real time :-)