Production Networking Issue
Incident Report for Braintree

Impact Description

Between 8:44 UTC and 9:46 UTC on 13 June 2019, impacted merchants encountered timeouts when attempting to reach Braintree’s API endpoint during the incident window. Some Control Panel users may have also had difficulty using features such as downloads.

Root Cause

Braintree routes traffic to the Gateway API via multiple internet service providers (ISPs) across multiple datacenters. One ISP servicing one of our datacenters had an upstream issue preventing inbound traffic from reaching our servers. While engineers were quickly alerted to the connectivity errors, it took some time to properly determine which ISP was having issues. Once engineers were able to confirm the impacted ISP, they removed both the ISP and the datacenter from serving public traffic and traffic levels and error rates recovered.

Corrective Actions & Preventative Measures

  • We are auditing and refining our automation around removing ISPs and datacenters from serving traffic (de-peering)

  • We are introducing higher-precision ISP-level monitoring into primary system dashboards

  • We are improving application resiliency during prolonged outbound connectivity issues

Posted Jun 14, 2019 - 15:45 UTC

This incident has been resolved. The full incident impact window was 8:44 to 9:46 UTC.
Posted Jun 13, 2019 - 11:12 UTC
A fix has been implemented and we are monitoring the results.
Posted Jun 13, 2019 - 10:59 UTC
We are continuing to work on a fix for this issue
Posted Jun 13, 2019 - 10:44 UTC
The issue has been identified and a fix is being implemented.
Posted Jun 13, 2019 - 10:15 UTC
We are continuing to investigate this issue
Posted Jun 13, 2019 - 09:45 UTC
We are continuing to investigate this issue.
Posted Jun 13, 2019 - 09:33 UTC
We're investigating a networking issue that may be impacting the availability of the API or Control Panel for some merchants.

Timeouts or increased latency when making API calls.

An upstream networking issue with a third-party Internet Service Provider (ISP).
Posted Jun 13, 2019 - 09:16 UTC
This incident affected: Transaction Processing (United States Processing, Canadian Processing, European Processing, APAC Processing), Control Panel, and Production API (Gateway API).