Infrastructure lessons to learn from Amazon, Google


Computerworld has an article on how Amazon and Google scale up their massive IT infrastructure. What interests me are the tips from top-notch engineers that run contrary to conventional enterprise approach to their computing infrastructure.

For example, Google (NASDAQ: GOOG) site reliability engineer Todd Underwood spilled the beans on one of the company's imperatives: Controlling the cost of hardware. According to Underwood, "a lot of what Google does is about being super-cheap." From reading some of the quotes, the focus is apparently not so much on attaining 100 percent availability, but merely achieving the targeted availability of 99.999 percent (Five 9's).

The remaining effort should be directed towards "moving as fast as you can," according to Underwood, alluding to the fast-paced environment datacenter environment at Google. "If you massively exceed that threshold you are wasting money," says Underwood. "Opportunity costs are our biggest competitor."

Not surprisingly, driving down costs for its infrastructure is a task that engineers at Amazon (NASDAQ: AMZN) focus on solely, too. James Hamilton, AWS' vice president and distinguished engineer provides more concrete figures, highlighting how servers sold through regular IT hardware channels costs about 30 percent more than buying the components directly from manufacturers.

So where does this place the enterprise, a place which has traditionally focused on expensive, over-engineered servers and networking equipment designed with the highest reliability in mind. And stacked on top of this setup would be more expensive equipment and software designed to guarantee business continuity should the servers fail anyway.

Though no enterprise in the world can even claim to be remotely close to Amazon or Google in scale, I believe there is an important lesson to be gained from studying their approach to infrastructure. For example, instead of buying a top-tier server equipped with onboard RAID storage, redundant power supplies and multiple Ethernet ports for failover, why not get two (or three) standard servers for a lower price and use virtualization to keep downtime to a minimum.

What are your thoughts on enterprise practices that cost too much for the benefits that they actually deliver? Feel free to chip in with a tweet, an email, or simply leave a note in the comments section below. - Paul Mah  (Twitter @paulmah)