On Sunday December 8th, one of the UPS units in the server room at Edgemont Road in Charlottesville failed. Staff are presently attempting to get support from the UPS vendor so we can restore services. The extent of the outage includes several web sites such as www.nrao.edu, science.nrao.edu, and almascience.nrao.edu. There is currently no ETA for restoration of services. More information will be posted as it becomes available.
Update as of 2019-12-08 16:20 EST: problem identified by UVa Facilities Management (power infrastructure). A fix is being sought. In the meantime, we have determined that the following are down:
- NAASC Lustre
- Most HPC cluster nodes
- Some DNS servers (cv3 notably)
- Many web sites that depend on MySQL service, including mattermost
Polaris (aka login.cv.nrao.edu) has been rendered functional by pointing DNS at New Mexico servers (as long as NAASC Lustre is not accessed by your login session there).
Update as of 2019-12-08 22:04 EST: Service has been restored to most CV-based systems, including the “open” Atlassian applications, the public web site, science helpdesks, CV email, science.nrao.edu, info.nrao.edu and others. CAUTION: as the fix entailed bypassing the failed UPS, much of the ER data center is currently not protected from power interruption, and so the risk of further service interruption is increased until a full repair can be undertaken on the (partially – one phase) failed UPS. The earliest ETR for fixing this UPS is Tuesday December 10th at close of business (ET).