Massive Amazon cloud service outage disrupts sites

Amazon didn’t, quite, break the Internet Tuesday but a more than four-hour problem at one of the main storage systems for its AWS cloud computing company did cause headaches for hundreds of thousands of websites across the United States.

A big portion of Amazon Web Services’ Amazon S3 system went offline Tuesday afternoon, a service used by 148,213 sites according to SimilarTech.

The outage appeared to have begun around 12:35 pm ET, according to Catchpoint Systems, a digital experience monitoring company. It involved a storage system for Amazon’s S3 service on the east coast, US-EAST-1. Operations were fully recovered by 4:49 pm ET, Amazon said.

That system was the first of what now are three AWS regions in the United States. It is still the largest and is also where AWS rolls out new features, “so it’s disproportionately big,” said Lydia Leong, a cloud analyst with Gartner.

AWS provides cloud-based storage and web services for companies so they don’t have to build their own server farms, allowing them to rapidly deploy computing power without having to invest in infrastructure. For example, a business might store its video or images or databases on an AWS server and access it via the Internet.

Companies that use AWS include Pinterest, Airbnb, Netflix, Slack, Buzzfeed, Spotify and some Gannett systems. While not all were affected by the outage, some experienced slowdowns.

AWS began as a profitable sideline to Amazon’s main online sales business but has since grown to become the major player in the arena as well as a major money-maker in its own right for Amazon. In the fourth quarter of 2016 the division accounted for 8% of Amazon’s total revenue.

“This is a pretty big outage,” said Dave Bartoletti, a cloud analyst with Forrester. “AWS had not had a lot of outages and when they happen, they’re famous. People still talk about the one in September of 2015 that lasted five hours,” he said.

S3  has “north of three to four trillion pieces of data stored in it,” Bartoletti said.

AWS S3 is used by businesses both large and small.  “More than anything else, S3 customer need to be able to get at their data, because often S3 is used to store images. So no S3, no nice picture or fancy logo on your website,” said Leong.

That was exactly the problem faced by Lewis Bamboo, a small, family-owned bamboo nursery in Oakman, Alabama.

“As our business is in bamboo plants, pictures are a very important part of selling our product online. We use Amazon S3 to store and distribute our website images. When Amazon’s servers went down, so did the majority of our website,” said the company’s chief technology officer Daniel Mullaly.

“Thankfully we also store the images locally and I was able to serve the images directly from our server instead,” he said.

The effects of the outage  varied depending on the site and how it used AWS. Modern websites usually pull data from multiple databases in the cloud which can be stored all over the world, so a photo might come from one place, a price list from another and a customer database from a third.

For that reason, entire websites rarely go down but various part of them may take a long time to load or not load at all, leaving broken links or images.

Companies have been steadily moving storage to the cloud because it is cheaper, easily accessible and more resilient. But the downside is that when there are problems, there’s a cascade effect.

“There are lots of people having a not very good day at the moment,” said Leong.

It’s possible to contract with multiple companies to avoid potential problems but it’s pricey, so many companies make peace with the knowledge that on rare occasions they’re going to have that very bad day.

“Only the most paranoid, and very large companies, distribute their files across not just AWS but also Microsoft and Google, and replicate them geographically across regions  —  but that’s very, very expensive,” she said.