In the last two years, as cloud computing has expanded, it has become an article of faith for many tech gurus to propagate the notion that cloud computing is the future. Startups have particularly become enamored with the cloud computing model. They just need computers with a good broadband connection and all their work can be done on the cloud. No need to setup and run expensive datacenters of their own.
The outage has partially shutdown or knocked out popular websites such as Foursquare, Reddit and Quora. The trouble started when Amazon's Service Health Dashboard started having connectivity problems which affected its Relational Database Service. The latter is used to manage a relational database over multiple zones in the eastern U.S.
The Amazon outage has not just affected websites hosted on the cloud and startups. Even well-established companies who were increasingly reliant on Amazon's services have been affected. The problem occurred at Amazon's northern Virginia EC2 cluster. Although, all the clusters are supposed to be backed up and also have redundancy, this did not happen. This is not the first time a cloud service provider has been affected by downtime. Microsoft's Azure also faced the same issue on April 15, when it faced an outage for six hours.
Amazon's service will pretty soon be online again and no doubt all the cloud computing companies will build more redundancy in their systems. But this outage does raise some issues that cloud users have to address. The first is that the users shouldn't depend exclusively on the cloud for everything. Companies which have backups either on their properties or through different cloud providers will fare better. Clients will also have to look into their service level agreements (SLAs) more closely. A 99.9% uptime guarantee may not cover all the services that you are signing up for. Even right now, Amazon's EC2 SLA has not been breached.
The almost four-day outage has not resulted in any SLA breaches because the agreement does not cover EBS and RDS services, which are the ones being affected. This may seem like a technicality for those businesses which have been affected. But this shows that you should not take your provider for granted. If all your critical processes run on a cloud service, then either create backups with other clouds or keep some data on your local servers. But building that resilience comes at a cost, in terms of manpower and hardware resources. And this is exactly what several companies, especially startups wanted to avoid.