Between 8:44 UTC and 9:46 UTC on 13 June 2019, impacted merchants encountered timeouts when attempting to reach Braintree’s API endpoint during the incident window. Some Control Panel users may have also had difficulty using features such as downloads.
Braintree routes traffic to the Gateway API via multiple internet service providers (ISPs) across multiple datacenters. One ISP servicing one of our datacenters had an upstream issue preventing inbound traffic from reaching our servers. While engineers were quickly alerted to the connectivity errors, it took some time to properly determine which ISP was having issues. Once engineers were able to confirm the impacted ISP, they removed both the ISP and the datacenter from serving public traffic and traffic levels and error rates recovered.
We are auditing and refining our automation around removing ISPs and datacenters from serving traffic (de-peering)
We are introducing higher-precision ISP-level monitoring into primary system dashboards
We are improving application resiliency during prolonged outbound connectivity issues