On Tuesday, May 16, Asana was down for approximately 30 minutes and partially available for another hour. In the spirit of transparency, we want to let our customers know what caused this outage and how we resolved it.
On Tuesday morning at approximately 8:40AM PDT, one of our databases became unreachable. The database’s automatic failover did not kick in, so the database stayed unreachable until an on-call engineer triggered a failover manually. After the database was failed over, many of Asana’s API servers did not correctly recover, so they also needed to be manually restarted.
We’ve since discovered a regression that stopped the API from recovering gracefully. That bug has been fixed and is making its way through our Continuous Integration pipeline. We’re also working with AWS to understand why the database failed and why it didn’t automatically recover. Finally, we’re working to understand any other reasons why our API servers did not correctly recover once the database was back up.