Delayed messages update
Incident Report for Help Scout
Postmortem

Unfortunately, it turns out yesterday's trouble with inbound email was not the end of the story. Between 10am EST on Sept. 9 and 2pm EST on Sept. 10, a small percentage of inbound emails failed to be properly processed and delivered to Help Scout.

While the database problem mentioned yesterday was a contributing factor, the real problem was a misconfigured inbound email server that was recently deployed. Since taking the server out of rotation today, all inbound email has been processed correctly.

It took us a while to acknowledge and explain this issue because we had no way to know it fixed the problem 100% until we could re-queue the emails for delivery. Now that we've had time to sanity check every detail, we're confident the issue is solved. All failed emails will be delivered by 7pm EST today.

How we're going to address this moving forward

We should have caught this issue earlier, preferably before the misconfigured server was deployed. We leaned too heavily on automated server scripts and didn't double-check the work in this case. Lesson learned and we're implementing process changes to make sure this mistake isn't made again.

We've also come up with a few ideas that will simplify our email delivery pipeline, resulting in fewer possible points of failure. Those changes will be our highest Ops priority until implemented.

Hindsight being 20/20, we also should have updated this site much earlier, even though we didn't fully understand what was going on. We'll learn from this experience and do better next time.

So sorry for the troubles over the last couple of days. Please reach out if you have any other questions about this incident and we'll take good care of you.

Posted Sep 10, 2015 - 18:51 EDT

Resolved
We continued to see sporadic email delivery failures up to 2pm EST today and finally narrowed the problem to a single server and took it offline. All delayed emails will finish being processed in the next 40 minutes, no data lost. We're going to follow-up with a much longer postmortem in the next hour, but wanted to send along an update!
Posted Sep 10, 2015 - 18:22 EDT