Yesterday afternoon around 3pm EST we discovered an issue with one of our mail servers. All internal monitoring said the server was fine and our outbound queues were clear, but this server was returning an error when attempting to send email.
All emails sent to the impacted mail server for processing failed and were backed up to another location. We have three main outgoing email servers, so roughly 2/3 of outgoing emails were sent immediately, without delay. The other 1/3 was backed up and queued for delivery via our other servers by 5pm EST.
No data was lost in this process and all delayed emails were processed by 8pm EST. Emails include notifications to customers and replies to customers.
We are able to prevent this in the future by adding more monitoring to the database table that stores outbound email errors. That way we'll be notified immediately if this issue ever comes up again.