PowerChurch Online Check In Status
Posted: Fri Mar 08, 2013 4:37 pm
Over the past couple of weeks, the PowerChurch Online Check In service has gone down at about the worst possible times. When this happened two weeks ago, the service failed during the day on Sunday, and didn't come back up for several hours. We took steps to closely monitor the situation and respond should it fail again, which it did last weekend, but it was down for minutes instead of hours this time. Unfortunately, it was also at a really bad time, so even that short window of downtime was unacceptable.
We have spent the past week reworking everything trying to find the source of the problem, and we have it boiled down to a couple of possibilities, but we're having a hard time recreating the issue reliably to really nail it down. In addition to all the programming changes we're working on for the service, we've also spent a lot of time beefing up the server infrastructure to allow for more redundancy and fault-tolerance on that side of things. We've been stress-testing this new setup, and are pleased with its performance at this time, so we're going to make it live and available for use this weekend. Since we're not 100% sure of the issue on the programming side (it seems to be a 3rd-party component - we're working with their support ongoing), we're going to monitor the process in realtime (actual physical eyeballs looking at the running processes), and head off any problems before they result in a service-interrupting crash. With the new redundancy, even if a single server has to be taken offline for a few minutes, the service itself will continue to be available without interruption.
Obviously, this arrangement is not optimal as a long-term solution, so we are continuing to work on addressing the root cause. We are very sorry for the downtime you have experienced, and we're working to make it right.
We have spent the past week reworking everything trying to find the source of the problem, and we have it boiled down to a couple of possibilities, but we're having a hard time recreating the issue reliably to really nail it down. In addition to all the programming changes we're working on for the service, we've also spent a lot of time beefing up the server infrastructure to allow for more redundancy and fault-tolerance on that side of things. We've been stress-testing this new setup, and are pleased with its performance at this time, so we're going to make it live and available for use this weekend. Since we're not 100% sure of the issue on the programming side (it seems to be a 3rd-party component - we're working with their support ongoing), we're going to monitor the process in realtime (actual physical eyeballs looking at the running processes), and head off any problems before they result in a service-interrupting crash. With the new redundancy, even if a single server has to be taken offline for a few minutes, the service itself will continue to be available without interruption.
Obviously, this arrangement is not optimal as a long-term solution, so we are continuing to work on addressing the root cause. We are very sorry for the downtime you have experienced, and we're working to make it right.