Inbound Emails Delayed
Incident Report for Help Scout
Postmortem

What happened?

Between 5:15 - 10:07pm EST on October 1, a small percentage of inbound emails failed to make it into Help Scout. Due to the seemingly random nature of the failures, it took the Operations team a while to identify the problem.

Once the problem was identified, all messages were queued for processing and delivered to Help Scout by 11:38pm EST.

Also, it seems very possible that we lost between 1,000 - 1,500 emails during this time. A collection of empty files written to one of the email servers leads us to believe that emails weren't written to the server as they usually are.

We have no way to be completely sure of any data loss and have yet to identify any emails that were not delivered. If you know of any emails that arrived between 5:15 - 10:07pm EST and never appeared in Help Scout, please let us know. We'll do everything we can to track it down.

We've never lost emails before and are hopeful no emails were lost in this case, but want to be clear that it's a possibility.

UPDATE: Thankfully, we've found no evidence of any emails being lost, which was originally a concern. This doesn't impact the changes we have planned, but it's important to pass along the good news!

What we're doing about it

This is the third issue with an inbound email server in the last month. Each time we have made changes that we think will address the problem moving forward, but have seemingly failed to identify the root cause.

Short term, we have pulled out the infrastructure that's caused these problems and put older email servers back into rotation, which have worked as expected for a longer period of time.

Longer term, we are working on a more scalable and fault-tolerant system for processing inbound email. We understand this is the lifeblood of your business and are dedicated to getting this right. It will be a top priority moving forward.

Posted Oct 02, 2015 - 16:37 EDT

Resolved
All email processing services are working as they should, we're closing this one out. We'll have more to share after we've had some time to investigate the root cause. Sorry for the trouble tonight!
Posted Oct 02, 2015 - 00:09 EDT
Monitoring
All delayed messages have been processed and inbound delivery is back on track. We're continuing to monitor our mail queues closely.
Posted Oct 01, 2015 - 23:39 EDT
Update
We've found some additional messages that have been delayed, processing is ongoing.
Posted Oct 01, 2015 - 22:56 EDT
Identified
We're currently processing messages that have been delayed. We're still looking through things on the backend to see what happened. We'll continue to update on processing status as we make progress.
Posted Oct 01, 2015 - 22:11 EDT
Investigating
We're currently looking in to reports of inbound email delays. We'll update here as soon as we've got some more information.
Posted Oct 01, 2015 - 21:59 EDT