Core services outage

Incident Report for Subbly

Postmortem

We would like to apologise for any inconvenience caused from the downtime this morning.

At around 4:30am GMT the core Subbly services came offline (website hosting was unaffected). Our team were immediately notified and sprung into action. We identified a failing API request to a third party which had come offline.

Our engineering team implemented a temporary hot fix as quickly as possible by reducing the timeout for the offending api call and this brought services back online but with slow loading times.

We identified that the third party had been offline for an entire 6 hours due to a major data centre outage. It didn’t affect us until 5+ hours later due to caching which eventually expired as expected (it’s almost unheard of for something to be down for 6 hours so we hadn’t planned for this).

We decided to refactor the code for the API call and moved into an asynchronous process which will never affect the uptime of the platform in future. This issue has now been considered permanently resolved.

Thank you for choosing Subbly, and apologies again for any inconvenience or stress caused.

Keep rocking on.

-The Subbly Team

Posted Dec 18, 2020 - 05:52 UTC

Resolved

Continuing to optimise, we will write up a postmortem after we have a permanent solution in place.

Sorry again for the inconvenience.

Posted Dec 18, 2020 - 05:14 UTC

Monitoring

We've deployed a fix, response times will be slow while the third party is offline. But we are optimising right now to try improve this.

Posted Dec 18, 2020 - 05:10 UTC

Identified

We have identified a third party api which has come offline which has brought our services offline. We are working on a fix.

Apologies for the inconvenience!

Posted Dec 18, 2020 - 04:49 UTC

This incident affected: Core Subbly Services (Admin, Checkout, Billing Engine, Legacy API, Feeds, Webhooks, Automations & Jobs, Emails & Communication).