Archives For data centers

digital cloud

Good stories need conflict, and if you’re going to have conflict, you need a villain. But you don’t always get the right villain in the process, as we can see with the NYT’s scathing article on waste in giant data centers which form the backbone of cloud computing. According to the article, data centers waste between 88% and 94% of all the electricity they consume for idle servers. When they’re going through enough electricity to power a medium sized town, that adds up to a lot of wasted energy, and diesel backups generate quite a bit of pollution on top of that. Much of this article focuses on portraying data centers as lumbering, risk averse giants who either refuse to innovate out of fear alone and have no incentive to reduce their wasteful habits. The real issue, the fact that their end users demand 99.999% uptime and will tear their heads off if their servers are down for any reason at any time, especially during a random traffic surge, is glossed over in just a few brief paragraphs despite being the key to why data centers are so overbuilt.

Here’s a practical example. This blog is hosted by MediaTemple and has recently been using a cloud service to improve performance. Over the last few years, it’s been down five or six times, primarily because database servers went offline or crashed. During those five or six times, this blog was unreachable by readers and its feed was present only in the cache of the syndication company, a cache that refreshes on a fairly frequent basis. This means fewer views because for all intents and purposes, the links leading to Weird Things are now dead. Fewer views means a smaller payout at the end of the month, and when this was a chunk of my income necessary for paying the bills, it was unpleasant to take the hit. Imagine what would’ve happened if right as my latest post got serious momentum on news aggregator sites (once I had a post make the front pages of both Reddit and StumbleUpon and got 25,000 views in two hours), the site went down due to another server error? A major and lucrative spike would’ve been dead in its tracks.

Now, keep in mind that Weird Things is a small site that’s doing between 40,000 to 60,000 or so views per month. What about a site that gets 3 million hits a month? Or 30 million? Or how about the massive news aggregators dealing with hundreds of millions of views in the same time frame and for which being down for an hour means tens of thousands of dollars in lost revenue? Data centers are supposed to be Atlases holding up the world of on-demand internet in a broadband era and if they can’t handle the load, they’ll be dead in the water. So what if they wasted 90% of all the energy they consumed? The clients are happy and the income stream continues. They’ll win no awards for turning off a server and taking a minute or two to boot it back up and starting all the instances of the applications it needs to run. Of course each instance takes only a small amount of memory and processing capability even on a heavily used server, so there’s always a viable option of virtualizing servers on a single box to utilize more of the server’s hardware.

If you were to go by the NYT article, you’d think that data centers are avoiding this, but they’re actually trying to virtualize more and more servers. The problem is that virtualization on a scale like this isn’t an easy thing to implement and there’s a number of technical issues that any data center will need to address before going into it full tilt. Considering that each center uses what a professor of mine used to call "their secret sauce," it will need to make sure that any extensive virtualization schemes it wants to deploy won’t interfere with their secret sauce recipe. When we talk about changing how thousands of servers work, we have to accept that it takes a while for a major update like that to be tested and deployed. Is there an element of fear there? Yes. But do you really expect there not to be any when the standards to which these data centers are held are so high? That 99.999% uptime figure allows for 8 hours and 45 minutes of total downtime in an entire year, and a small glitch here or there can easily get the data center to fail the service contract requirements. So while they virtualize, they’re keeping their eye on the money.

But the silver lining here is that once virtualization in data centers becomes the norm, we will be set for a very long period of time in terms of data infrastructure. Very few, if any, additional major data centers will need to be built, and users can continue to send huge files across the web at will just as they do today. If you want to blame anyone for the energy waste in data centers, you have to point the finger squarely at consumers with extremely high demands. They’re the ones for whom these centers are built and they’re the ones who will bankrupt a data center should an outage major enough to affect their end of month metrics happen. This, by the way, includes us, the typical internet users as well. Our e-mails, documents, videos, IM transcripts, and backups in case our computers break or get stolen all have to be housed somewhere and all these wasteful data centers is where they end up. After all, the cloud really is just huge clusters of hard drives filled to the brim with stuff we may well have forgotten by now alongside the e-mails we read last night and the Facebook posts we made last week…

Surely by now you’ve heard of the electronic cloud, the magical place where all your data lives, beaming down to whatever wireless device you happen to be holding at any given moment. No matter where you go, you can dial up your contacts, e-mail that presentation, or update your to do lists. Lost on the way somewhere or just forgot the number of the building? Need to call and say you’ll be late to a meeting but don’t remember who to call thanks to an inopportune brain fart? No worries, just tap into the cloud on your tablet or smart-phone and the data you need is there in just seconds.  It’s great, it’s incredibly useful, it’s the next logical abstraction of a wireless world taking advantage of web-based data access, and it’s quickly become the modern standard for managing the streams of information, both useful and useless, being generated every day. And you shouldn’t trust it. Not in the slightest. Why? Because your data in the cloud doesn’t belong to you and should the servers storing it be sold off if their current owner goes out of business, you have few assurances it will stay there.

Granted, you could look at companies like Google and Microsoft and declare that they will either never falter, or if they do, they’ll just transfer the data to the new owners, as would any other cloud storage company. But the problem here is that new owners often mean new infrastructure and the need to do migrations, updates, and server swaps. Individual accounts may get locked out or their data lost somewhere in the clutter. As long as a large company is convinced that it can make good money by studying your personal data for keywords which can be used by advertisers, it will offer free cloud storage of some sort, but if these companies fail to grow or are abandoned in favor of something new, expect them to either start charging or discontinuing their services altogether since all those servers could be put to other, more lucrative uses. Remember, you want that data to be accessible for years, and over years a lot of things can happen. What if a hacker attacks and your data was lost or mined for financial information or passwords? What if the company maintaining the servers is gone? If your data is in danger of being lost or inaccessible, someone will do something, right?

Well, it would be nice to think that they would offer you to download all your data back and place it somewhere new or handle the migration for you, but really, that would be a courtesy some companies may not offer if they go belly up fast enough. They have no obligation to deal with your data beyond trying to keep it secure and you have very little standing in demanding it back because you handed it over and agreed to the litany of legalese holding the cloud storage provider innocent of anything and everything that could possibly happen to your data on their servers, maybe even a robot uprising which turns the data center into the AI Overlord of the Cybernetic New World Order. I don’t know whether that’s really covered or not, but when was the last time you read those terms and conditions? Do you know for sure that you can get your content back if it gets gobbled up by an evil, could-based artificial intelligence? Of course there’s a rather easy way to deal with this while enjoying all the benefits of the cloud. Just keep recent personal backups. That way, if the worst comes to pass, you’ll have the most current copy of your data ready to be re-uploaded somewhere new. After all, it’s your data. Why let some company have it and trust that your digital stuff will never get lost no matter what happens?