amazon: Amazon explains what took a part of the internet down


NEW DELHI: Earlier this month, Amazon Web Services outage took parts of the internet offline for several hours. The website that went down included several of Amazon’s own websites, Netflix and Disney+.
“An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network. This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks. These delays increased latency and errors for services communicating between these networks, resulting in even more connection attempts and retries. This led to persistent congestion and performance issues on the devices connecting the two networks.” said the company in a post on its website.
“We have taken several actions to prevent a recurrence of this event. We immediately disabled the scaling activities that triggered this event and will not resume them until we have deployed all remediations… We have also deployed additional network configuration that protects potentially impacted networking devices even in the face of a similar congestion event,” added the company in the post.
Amazon also apologised to its customers for the disruption. “Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our track record of availability, we know how critical our services are to our customers, their applications and end users, and their businesses. We know this event impacted many customers in significant ways. We will do everything we can to learn from this event and use it to improve our availability even further,” said the company while concluding the post.