FierceCIOFierceCIOTechWatchFierceMobileITFierceContentManagement   FierceHealthITFierceFinanceIT

Free Newsletter

About | View Sample | Privacy

Why your cloud service will eventually fail


I'm sure you've heard about it by now--considering the indignation and anger that rippled through the Internet--Google's Gmail went down again earlier this week, this time for a total of 100 minutes. The official Gmail blog has the technical details of what exactly happened, though I thought fellow editor of FierceCIO Judi Hasson wrote a succinct report pertaining to the important facts of this outage.

For my opinion on the debacle, feel free to check out "Gmail goes down; is it really cause for concern?." For today's commentary though, I wanted to focus on the bigger picture related to the very hot topic of cloud computing.

Expectations over cloud computing

There is no doubt that everyone wants a slice of the cloud computing pie. Just this week alone, we see VMware announcing a new service called vCloud Express, a thinly-veiled Amazon competitor. Even IBM has plans to roll out its own cloud-based desktop, which can be accessed via nothing more than a standard web browser with Java support.

So why were those 100 minutes of downtime on Gmail such a huge deal? I know, business got disrupted and all, but you can't tell me that none of your enterprise servers have ever crashed. Indeed, I never even noticed the downtime myself, as I use an auto-forward rule on my Gmail account, which effectively bypassed the affected web interface.

It is at this point that we arrive at the crux of the matter: That nobody expects a cloud-based architecture to fail. It's that simple. So even though Gmail was never ever taken down for system maintenance, shutdown temporarily for hardware "upgrades," or experienced any of the "scheduled downtime" that is so common in the enterprise, everyone's expectation on the matter is still unanimous and unyielding. You see, cloud services are not supposed to go down.

The downside of cloud computing

Now that we are clear on our expectations for cloud computing, I want to show you why it is not a realistic expectation to have perfect uptime, in spite of everything you have read.

You see, in our haste to build the smartest, most fault-redundant network, and the best hardware, what many forget is the somber fact that the human "fallibility quotient" remains unchanged. By that, I mean the ability for the best of us to make mistakes, or for crucial oversights to happen.

Ironically, while human errors committed under the old paradigm are limited at most to a single corporation or department, errors in the cloud are amplified many times over. With large services like Gmail, even the smallest downtime has the potential to be felt around the globe.

In addition, consider this: A smaller company would just have flipped the reboot switch, probably taking only 10 to 15 minutes to get things up and running again. This option is clearly not possible for Google's Gmail. Alas, the behemoth and immensely powerful system created to host millions of Gmail users is also necessarily complex.

Just as it takes kilometers of sea for an immense, oil-carrying supertanker just to come to a stop, problems can take time to rectify even if remedied practically immediately. This was the case with the recent outage--all of 100 minutes in fact. - Paul Mah

SHARE WITH:
Email Twitter Facebook LinkedIn StumbleUpon
Get Your FREE FierceCIO:TechWatch Email Newsletter: