Starting at 11:38 EST this morning, the Help Scout web application experienced sporadic connection failures, resulting in long page load times, and in some cases, pages failing to load at all. The off-and-on nature of this issue resulted in roughly 16 minutes of downtime.
The root cause was related to our centralized logging infrastructure. Centralized logging is a critical system we use to debug issues and monitor the overall health of Help Scout. It's designed to never impact customer-facing systems, but in today's case a portion of it did.
This morning was an issue we haven't seen before. We could connect to the centralized logging cluster, but were unable to push data into it due to a server error. The Ops team rebuilt and re-synced the cluster to get things back on track.
As part of the recovery, we pushed a change to so that this same type of error won't ever have a customer-facing impact again. We still have a bit more investigating to do on our end as to how the problem came about, but it won't impact your experience again.
We do everything we can do avoid these moments, and apologize for letting you down today. We learned a lot and will keep working to provide you with a reliable experience moving forward.