During the incident window, CenturyLink/Level(3), a major ISP and Internet transit provider, experienced an outage that impacted several Braintree services in addition to a significant number of other services and providers across the Internet.
Braintree utilizes several ISPs, so most services were not impacted by this issue. However, merchants utilizing the Forward API may have experienced an increase in HTTP 5xx errors between 10:02 and 12:58 UTC. Additionally, merchants using 3D Secure may have seen an increase in “authentication_unavailable” 3DS results between 10:02 and 14:23 UTC.
Braintree services utilize multiple ISPs for inbound and outbound traffic across physical data centers and CenturyLink/Level(3) is one of them. When CenturyLink began having issues at approximately 10:02 UTC, that caused intermittent packet loss for any requests utilizing the Forward API, 3D Secure, or Fraud Protection. Engineers were engaged but it was not immediately clear where in the networking path the packet loss was occurring. As soon as CenturyLink was identified as a possible root cause, engineers began moving Forward API traffic to alternate ISP paths. Fraud Protection and 3D Secure rely on external providers who were themselves affected. We worked with those providers and tried multiple configurations under our control to restore connectivity to them without affect, ultimately relying on those providers to mitigate the issues.
Corrective Actions & Preventative Measures