Elevated Client Errors
Incident Report for Braintree
Postmortem

Impact Description

During the incident timeframe, merchants using the following SDKs may have received HTTP 401 errors when attempting to obtain client authorization to load the Drop-in UI or Hosted Fields:

  • JavaScript 3.27.0+

  • Android 2.9.0+

  • iOS 4.14.0+

These errors resulted in an inability to load or display the Drop-in UI or Hosted Fields. The error message returned was “Authentication credentials are invalid.”

Summary & Root Cause

At 22:12 UTC, engineers began rolling out a configuration change to our client API hosts. The change was rolled out to one AWS region at a time, with several minutes in between to monitor the impact of the change. No issues were initially detected.

By 23:00 UTC, the change had been rolled out to all AWS regions and the rate of 401 errors increased above normal levels. To remediate, engineers rolled back the change in each AWS region, completing at 23:39 UTC. Errors subsided to normal levels by 00:05 UTC.

The root cause was determined to be an incorrect set of whitelist rules contained in the configuration change, which caused clients using the above SDKs to encounter authentication errors.

Corrective Actions & Preventative Measures

  • We will be increasing the number of teams who audit proposed configuration changes.

  • Changes will be rolled out to each AWS region more slowly, allowing additional time to monitor for changes in error rate on each region.

  • All client API error rate monitors and alerts will be audited and new monitors/alerts will be added where necessary to improve our ability to detect future issues.

  • New smoke tests will be introduced to future client API operations.

Posted 3 months ago. Apr 03, 2019 - 19:24 UTC

Resolved
Beginning at 22:12 UTC, engineers detected an elevated rate of client-side errors and difficulties loading Braintree-hosted checkout forms. Engineers reverted a deployment that caused the problems and service returned to normal at approximately 00:05 UTC.

Symptoms
An elevated rate of client-side errors.
Posted 3 months ago. Apr 01, 2019 - 23:45 UTC
This incident affected: Production API (Gateway API).