Production Networking Issue
Incident Report for Braintree

Impact Description

During the incident window between 17:56 and 18:18 UTC on 12 December 2019, an elevated rate of merchant traffic received connection timeout errors when attempting to call the Gateway API. This includes approximately 100% of Pay with Venmo traffic. Merchants hosted within US AWS regions may have been unable to reach the Gateway API.

Root Cause

At 17:56 UTC, we encountered a hardware failure for one of our load balancers servicing one of our data centers. This load balancer became unable to service traffic, which is typically quickly resolved by automatic failover mechanisms. In this case, the automatic failover mechanisms did not trigger and engineers had to manually intervene to mitigate the issue at 18:18 UTC.

Corrective Actions & Preventative Measures

  • The impacted load balancer will remain out of service until engineers are able to ensure a successful return to service.
  • Automated failover mechanisms are being checked and improved to ensure full functionality during future hardware failures.
Posted Dec 13, 2019 - 18:40 UTC

We have seen no further issues and this incident is now resolved. A complete root cause analysis will be completed and published here in the coming days.

Failed transactions can be safely retried.
Posted Dec 12, 2019 - 18:30 UTC
Engineers have put a fix in place and traffic has shown signs of recovery since 18:18 UTC.
Posted Dec 12, 2019 - 18:25 UTC
We're investigating a networking issue that may be impacting the availability of the API or Control Panel for some merchants.

Timeouts or increased latency when making API calls.

Engineers are investigating network-related causes.
Posted Dec 12, 2019 - 18:16 UTC
This incident affected: Control Panel and Production API (Gateway API).