Postmortem of our Amazon S3 Outage

Will

Posted by Will at January 7th, 2011

At approximately 1:40 PM PST yesterday we encountered an issue with our Amazon S3 account that handles media storage for both Lighthouse and Tender.

The outage affected all files processed by our servers. We quickly contacted Amazon and resolved the issue, however the time delay for Amazon's system to accept our changes took longer than expected.

At 5:36 AM PST upload functionality was restored. This morning we began to reprocess the backlog of uploaded files that were not working from the outage. We've now successfully restored a vast majority of the backlog.

A small number of uploads returned errors during our attempts to reprocess them. If you notice missing files, please contact support@lighthouseapp.com or support@tenderapp.com and let us know, but you'll need to contact the user and have them upload the files again.

We sincerely apologize about the outage and are working to put countermeasures in place to prevent the edge case uploads that we've been unable to restore. When things go wrong, we hurt just as much as our customers using Lighthouse and Tender.

4 Comments

  1. Nik Wekwerth Nik Wekwerth said on January 7th, 2011

    Not sure if you saw GitHub’s post mortem bit.ly/giVnHX – you should try Librato Silverline to get exact insight into application resource uses. It’s a great tool to not only understand what caused the problem but also avoid it in the future.

    Disclosure: I work for Librato.

  2. Drew McLellan Drew McLellan said on January 8th, 2011

    What was really awkward for us was not that attachments weren’t available (that’s ok) but that Tender kept to accepting attachments during that time.

    We had customers raising tickets for help and attaching files that we then couldn’t look at. That both made us look stupid AND meant that we had to frustrate already frustrated customers further by asking for the files by email. If attachments weren’t available they would have emailed or given us a URL in the first place and all would have been well.

    Tender is supposed to make the process of getting support easier for our customers, and less work for us to manage. Yesterday it made both those things harder – not because of the S3 problem, but because you left the attachment upload functionality on during it.

  3. Nick Nick said on January 8th, 2011

    Don’t worry about it! :) Keep up the good work! :)

  4. Nick Nick said on January 8th, 2011

    Sorry about the cross out…Keep up the good work!

Make your voice heard

Sorry, but comments are closed for this item.