Unfortunately, we experienced some downtime with the admin and checkout services tonight. Websites were not affected.
Here’s what happened:
We jumped to action right away to investigate, so there was some lag with us providing updates.
After about 20 minutes we managed to isolate the issue to one of our caching server clusters running out of memory.
One of our scripts that generates reports was running multiple instances simultaneously and due to over-caching it used up all the spare memory, eventually bringing the cache layer offline and the rest of the application with it. Ultimately it came down to poor planning.
To avoid risking losing data we had to free up some of the cache records manually to allow us to then clear unnecessary data.
The problem has been resolved and the offending script has been permanently patched and a good lesson along with it.
We sincerely apologise for the awful timing of this incident. We know your trust in us is essential and that you depend on us. We will continue to fight for 100% uptime and reliability in order to earn your trust.
If you have any follow up questions or concerns, please don’t hesitate to reach out to us.
Stefan Pretty, CEO of Subbly