Between 20:16 and 20:51 UTC on 14 May 2019, impacted merchants received an elevated rate of HTTP 500-level responses when attempting to issue a variety of API calls related to the Vault (i.e. customer/payment method create, update, search, or find), recurring billing, or transaction find/search functions. Some Control Panel users may have also had difficulty using some functions during the incident window. Transaction create calls were largely unimpacted due to transaction resiliency processes.
The root cause was determined to be related to an operation that incorrectly assigned a conflicting IP address to a newly-provisioned database server that was not yet ready to accept production traffic.
Engineers were in the process of bootstrapping new production database servers and were in the process of assigning unused IP addresses to the servers. Once they found what was thought to be an unused IP address, they proceeded to run several tests to ensure the IP address was available for use. Despite these tests, an in-use IP address was assigned to one of the newly-provisioned servers. This IP address was already assigned to the primary node in a production transaction database shard. This IP conflict incorrectly routed production traffic to the newly-provisioned server, which was not yet ready to accept traffic and resulted in HTTP 5xx errors beginning at 20:16 UTC.
Due to some recently-implemented transaction resiliency measures, transaction create calls were balanced on other transaction shards and processed successfully. Other API and Control Panel traffic errored when the database queries were incorrectly made to the newly-provisioned database server.
Once engineers were able to determine that a conflicting IP was the cause of the errors, the network interface on the newly-provisioned server was forced down. This redirected the traffic back to the intended database servers.
Improved database connection monitoring will be introduced.
Additional database resiliency will be added to rescue other critical API traffic such as customer/payment creates
Networking capacity available to database servers will be expanded to eliminate the need to reuse IP addresses