Date: Wednesday, 22 August 2018 22:06 - 22:22
All Times in UTC
At 22:06, a networking event inside one of our datacenters triggered an automated database failover. Our automated database failover did not complete, leaving specific database clusters in a nonfunctional state. Engineers intervened and completed the failover at 22:22. The impacted databases handle traffic for American Express, Pay with PayPal, and some other transaction processing for a select group of merchants. As a result, impacted merchants may have observed the following between 22:06 and 22:22:
Failed PayPal and American Express transactions
Authorized transactions where the request to Submit for Settlement failed
Authorized transactions that do not appear in the gateway
A planned maintenance operation on a failed core network switch resulted in an unexpected loss of connectivity to one datacenter rack that contained primary databases for some of Braintree's production database clusters. This loss of connectivity triggered an automated failover processes for those clusters, however management equipment in the same rack was unreachable, which left the failover in an incomplete state. While new primary members were promoted for each cluster, database clients could not connect to the affected clusters since subsequent steps in the automation were not completed. On-call engineers intervened to complete the failover process, restoring service. The affected databases served American Express, Pay with PayPal, and one member of our transaction database pool (which impacted the group of merchants assigned to that cluster).