It's been a rough couple of weeks for Tender customers. We've encountered multiple, significant downtimes and our developers have been working hard to find the source of this issue. Yesterday the culprit was discovered — a memory leak which manifested itself within a timeframe of less than 20 seconds was causing entire servers to disappear from the cloud.
While it's arguable whether a cloud hosting platform should be susceptible to this type of failure or not, the net effect was that our cluster was cycling through servers at an elevated rate. Because the virtual machines would be destroyed when the leak happened (within mere seconds) without any notification, there were no log files available and we had to escalate the issue to get some developers at our hosting provider involved.
After our investigation uncovered the leak, we disabled the code responsible for it, added more fault tolerance to our backend code, and added a ton of additional monitoring that will alert us about things before they become visible to users (such as statistical monitoring, like the standard deviation on the amount of time for a roundtrip email).
We understand our customers rely on our service to communicate successfully with their own customers and we take downtimes very seriously. This should be the end to this downtime.


Make your voice heard
We value freedom of speech, but please don't be an asshat. You can use Textile in your comments. Surround code in a
<macro:jscode lang="LANG">block.