Messages & Announcements

  • 2018-10-19:  Anvil, Crane and Tusker services restored
    Category:  General Announcement

    HCC's datacenter at PKI in Omaha suffered an unexpected power outage the morning of Friday, Oct 19th during a preventative maintenance window.

    This type of maintenance has occurred without issue many times in the past and requires the datacenter UPS (battery backup) be bypassed meaning all equipment relies directly on city power. While the bypass was in place there was an issue with the city power feed which caused many servers to reboot unexpectedly and various pieces of networking to fail.

    HCC staff has worked throughout the day to restore services and believes we have done so at this time. All services hosted at PKI were affected including:

    - ANVIL: Many VMs hosts were rebooted including the instances running on those hosts. Please check your instances and contact hcc-support@unl.edu with your instance ID if you have any problems.

    - CRANE / TUSKER : Running jobs were killed and users should check their /home and /work files that may have been open or in the process of being written. Files being written during the power outage are likely lost or corrupted.

    - COMMON Filesystem : Users should check their files exist and are accessible. Files being written during the power outage are likely lost or corrupted.

    This is the first major power issue at this datacenter in a very long time and we will investigate and take any possible actions to prevent it from happening again. At this time it appears to have simply been a very unfortunate coincidence of being off battery power while the main power feed had an unexpected failure.

    Please contact hcc-support@unl.edu with any questions or issues resulting from this outage.


    HCC's datacenter at PKI in Omaha suffered an unexpected power outage the morning of Friday, Oct 19th during a preventative maintenance window.

    This type of maintenance has occurred without issue many times in the past and requires the datacenter UPS (battery backup) be bypassed meaning all equipment relies directly on city power. While the bypass was in place there was an issue with the city power feed which caused many servers to reboot unexpectedly and various pieces of networking to fail.

    HCC staff has worked throughout the day to restore services and believes we have done so at this time. All services hosted at PKI were affected including:

    - ANVIL: Many VMs hosts were rebooted including the instances running on those hosts. Please check your instances and contact hcc-support@unl.edu with your instance ID if you have any problems.

    - CRANE / TUSKER : Running jobs were killed and users should check their /home and /work files that may have been open or in the process of being written. Files being written during the power outage are likely lost or corrupted.

    - COMMON Filesystem : Users should check their files exist and are accessible. Files being written during the power outage are likely lost or corrupted.

    This is the first major power issue at this datacenter in a very long time and we will investigate and take any possible actions to prevent it from happening again. At this time it appears to have simply been a very unfortunate coincidence of being off battery power while the main power feed had an unexpected failure.

    Please contact hcc-support@unl.edu with any questions or issues resulting from this outage.

  • 2018-10-19:  Anvil, Crane and Tusker impacted by power outage at PKI data center
    Category:  General Announcement

    HCC staff are investigating nodes impacted now and are working to bring the systems back online. A follow-up announcement will be sent once the systems are brought to a production state.


    HCC staff are investigating nodes impacted now and are working to bring the systems back online. A follow-up announcement will be sent once the systems are brought to a production state.

  • 2018-10-15:  Rolling updates on HCC cluster resources
    Category:  General Announcement

    Security updates are being applied to the Crane, Sandhills and Tusker clusters. The rolling updates are applied as soon as the oldest job on a worker node finishes, potentially keeping new jobs from starting until the updates are applied. This will cause pending jobs to report lengthened wait times in queue with a reason of "Priority," "Resources," or "ReqNodeNotAvail, UnavailableNodes," particularly if the requested time by the job exceeds the time before the worker nodes' resources drain. Shorter jobs will likely have more resources to run on in the next couple of days, as long as they finish before the oldest job completes on draining worker nodes.

    All updates will be finished by 10PM Friday, October 19th.


    Security updates are being applied to the Crane, Sandhills and Tusker clusters. The rolling updates are applied as soon as the oldest job on a worker node finishes, potentially keeping new jobs from starting until the updates are applied. This will cause pending jobs to report lengthened wait times in queue with a reason of "Priority," "Resources," or "ReqNodeNotAvail, UnavailableNodes," particularly if the requested time by the job exceeds the time before the worker nodes' resources drain. Shorter jobs will likely have more resources to run on in the next couple of days, as long as they finish before the oldest job completes on draining worker nodes.

    All updates will be finished by 10PM Friday, October 19th.

  • 2018-10-02:  SANDHILLS will no longer accept new jobs after October 15th 2018, cluster is being retired
    Category:  General Announcement

    As of October 15th Sandhills will no longer accept job submissions. Jobs submitted up to that date will be allowed to run to completion as normal.

    The /home and /work filesystems for Sandhills, along with the login and transfer nodes, will remain available beyond October 15th to allow users time to migrate their data to other places such as the shared /common filesystem or the Crane cluster. It is *not* recommended to migrate data to Tusker at this time as it will be physically relocated in the near future and be completely unavailable during that time.

    We will keep the Sandhills filesystems, login, and transfer nodes available until the end of the 2018 calendar year at which point they will be turned off completing the retirement of Sandhills.

    For additional information on the Sandhills retirement and Tusker migration see the original announcement at https://newsroom.unl.edu/announce/holland/8444/48362.


    As of October 15th Sandhills will no longer accept job submissions. Jobs submitted up to that date will be allowed to run to completion as normal.

    The /home and /work filesystems for Sandhills, along with the login and transfer nodes, will remain available beyond October 15th to allow users time to migrate their data to other places such as the shared /common filesystem or the Crane cluster. It is *not* recommended to migrate data to Tusker at this time as it will be physically relocated in the near future and be completely unavailable during that time.

    We will keep the Sandhills filesystems, login, and transfer nodes available until the end of the 2018 calendar year at which point they will be turned off completing the retirement of Sandhills.

    For additional information on the Sandhills retirement and Tusker migration see the original announcement at https://newsroom.unl.edu/announce/holland/8444/48362.

  • 2018-09-19:  Crane: /work filesystem downtime resolved
    Category:  General Announcement

    The /work filesystem for Crane is restored as of 11:30am. A filesystem check was completed with no errors found.


    Running jobs which were accessing /work stalled until the filesystem was restored. This may have caused jobs to exceed their time limit. There was no data loss from this outage.