in defense of sprawling, wasteful data centers
Good stories need conflict, and if you’re going to have conflict, you need a villain. But you don’t always get the right villain in the process, as we can see with the NYT’s scathing article on waste in giant data centers which form the backbone of cloud computing. According to the article, data centers waste between 88% and 94% of all the electricity they consume for idle servers. When they’re going through enough electricity to power a medium sized town, that adds up to a lot of wasted energy, and diesel backups generate quite a bit of pollution on top of that. Much of this article focuses on portraying data centers as lumbering, risk averse giants who either refuse to innovate out of fear alone and have no incentive to reduce their wasteful habits. The real issue, the fact that their end users demand 99.999% uptime and will tear their heads off if their servers are down for any reason at any time, especially during a random traffic surge, is glossed over in just a few brief paragraphs despite being the key to why data centers are so overbuilt.
Here’s a practical example. This blog is hosted by MediaTemple and has recently been using a cloud service to improve performance. Over the last few years, it’s been down five or six times, primarily because database servers went offline or crashed. During those five or six times, this blog was unreachable by readers and its feed was present only in the cache of the syndication company, a cache that refreshes on a fairly frequent basis. This means fewer views because for all intents and purposes, the links leading to Weird Things are now dead. Fewer views means a smaller payout at the end of the month, and when this was a chunk of my income necessary for paying the bills, it was unpleasant to take the hit. Imagine what would’ve happened if right as my latest post got serious momentum on news aggregator sites (once I had a post make the front pages of both Reddit and StumbleUpon and got 25,000 views in two hours), the site went down due to another server error? A major and lucrative spike would’ve been dead in its tracks.
Now, keep in mind that Weird Things is a small site that’s doing between 40,000 to 60,000 or so views per month. What about a site that gets 3 million hits a month? Or 30 million? Or how about the massive news aggregators dealing with hundreds of millions of views in the same time frame and for which being down for an hour means tens of thousands of dollars in lost revenue? Data centers are supposed to be Atlases holding up the world of on-demand internet in a broadband era and if they can’t handle the load, they’ll be dead in the water. So what if they wasted 90% of all the energy they consumed? The clients are happy and the income stream continues. They’ll win no awards for turning off a server and taking a minute or two to boot it back up and starting all the instances of the applications it needs to run. Of course each instance takes only a small amount of memory and processing capability even on a heavily used server, so there’s always a viable option of virtualizing servers on a single box to utilize more of the server’s hardware.
If you were to go by the NYT article, you’d think that data centers are avoiding this, but they’re actually trying to virtualize more and more servers. The problem is that virtualization on a scale like this isn’t an easy thing to implement and there’s a number of technical issues that any data center will need to address before going into it full tilt. Considering that each center uses what a professor of mine used to call “their secret sauce,” it will need to make sure that any extensive virtualization schemes it wants to deploy won’t interfere with their secret sauce recipe. When we talk about changing how thousands of servers work, we have to accept that it takes a while for a major update like that to be tested and deployed. Is there an element of fear there? Yes. But do you really expect there not to be any when the standards to which these data centers are held are so high? That 99.999% uptime figure allows for 8 hours and 45 minutes of total downtime in an entire year, and a small glitch here or there can easily get the data center to fail the service contract requirements. So while they virtualize, they’re keeping their eye on the money.
But the silver lining here is that once virtualization in data centers becomes the norm, we will be set for a very long period of time in terms of data infrastructure. Very few, if any, additional major data centers will need to be built, and users can continue to send huge files across the web at will just as they do today. If you want to blame anyone for the energy waste in data centers, you have to point the finger squarely at consumers with extremely high demands. They’re the ones for whom these centers are built and they’re the ones who will bankrupt a data center should an outage major enough to affect their end of month metrics happen. This, by the way, includes us, the typical internet users as well. Our e-mails, documents, videos, IM transcripts, and backups in case our computers break or get stolen all have to be housed somewhere and all these wasteful data centers is where they end up. After all, the cloud really is just huge clusters of hard drives filled to the brim with stuff we may well have forgotten by now alongside the e-mails we read last night and the Facebook posts we made last week…