We have found the issue and are deploying changes to our servers.
We found that the database server that was unable to keep up with a sudden influx of events at peak demand. We made the decision to upgrade the database to more robust platform to prevent these errors and periods of poor performance.
Unfortunately, the upgrade and migration to the new database took longer than anticipated and service was completely interrupted for all users for around 2 hours. The new database should scale with anticipated growth in the future and we have also put in place new reporting tools that will notify our development team if there are any issues moving forward.