Connection issues and degraded performance
Incident Report for Help Scout
Postmortem

What happened

This morning from about 8:20am - 9:25am EST, Help Scout performance was severely degraded to the point where page loads were taking 10+ seconds. It took a while for us to understand the root cause, which was a problem with a build deployed this morning.

The problematic build adds a few new features that monitor connectivity to Help Scout (oh the irony!) and saves data locally in the event of a connection failure. For instance, if you are typing a draft or a note and lose connection, the feature would save your work to your local browser so that it's never lost.

While that's all fine and dandy, what we deployed didn't work properly. The code that checks for an internet connection was looking for a file that did not exist in our production environment, so it timed out and created several issues that became visible to you.

What we're doing about it

This wasn't a failure of any infrastructure or services we rely on. Instead it was human error on our side. It likely would have been prevented with more diligent testing and review of the code that was going out. In this case, a subtle difference between our local and production environments was the reason we did not catch it.

We have egg on our face with this one, and sincerely apologize for the sluggish experience you had this morning. We'll continue to refine our internal testing and deployment processes in an attempt to make this sort of error impossible moving forward.

Thanks for reading,
Nick from the Help Scout Team

Posted Feb 11, 2015 - 13:25 EST

Resolved
Everything is back to normal. We'll follow-up later today with more information on what happened. We're so sorry!
Posted Feb 11, 2015 - 10:43 EST
Update
We're still seeing some slow page loads and connectivity issues. We're investigating and monitoring.
Posted Feb 11, 2015 - 10:33 EST
Monitoring
We're back to normal performance levels, but are still moving some things around. We're monitoring everything very closely for any further issues.
Posted Feb 11, 2015 - 09:26 EST
Identified
We think the issue is related to our load balancers not working properly, so we're putting in new ones to address the issue.
Posted Feb 11, 2015 - 09:12 EST
Update
We're still experiencing slow performance and some connection failures, have not yet identified the problem but have a few people working on it.
Posted Feb 11, 2015 - 09:00 EST
Investigating
We're investigating reports of random blank pages in the web app and are looking into the issue.
Posted Feb 11, 2015 - 08:43 EST