Production Networking Issue
Incident Report for Braintree

RCA: Production Networking Issues

Date: Monday, 26 March 2018 12:33 - 13:43

All Times in UTC

Executive Summary
Beginning at 12:33, engineers observed a small decrease in traffic to the Braintree production API endpoints. Upon investigation of our network paths, engineers were able to determine that the issues stemmed from an upstream issue with one of our Internet Service Providers (ISPs). Engineers then re-routed traffic away from the affected ISP, which improved latency and brought traffic back to normal levels as of 13:43.


  • We are working with the affected ISP to determine what caused the issue and what they are doing to prevent this type of incident in the future.
  • We plan to explore the addition of additional ISPs to improve redundancy and provide additional ingress routes.
  • We are reviewing internal procedures to improve the process for removing ISPs as soon as we detect a drop in traffic.
  • We are actively working on several initiatives to improve latency, uptime, and redundancy by leveraging failover processing paths as well as routing API traffic through alternate services.
Posted about 1 year ago. Mar 26, 2018 - 23:36 UTC

Traffic is now flowing normally and we have not seen additional latency or timeouts since 13:43 UTC.
Posted about 1 year ago. Mar 26, 2018 - 14:21 UTC
We have re-routed traffic away from one of our ISPs. Latency has decreased and engineers are monitoring.
Posted about 1 year ago. Mar 26, 2018 - 13:52 UTC
We have detected a problem routing inbound requests to the production gateway and Control Panel via one of our Internet Service Providers (ISPs). Merchants may be experiencing timeouts or latency trying to connect to the Braintree Production API or Control Panel. Engineers are investigating.
Posted about 1 year ago. Mar 26, 2018 - 13:31 UTC
This incident affected: API.