Messages & Announcements

  • 2016-09-29:  final warning: files to be removed from /work on Tusker
    Category:  General Announcement

    This notice affects users with data on the /work filesystem of Tusker. Files not accessed for over 6 months will be removed this week.

    HCC now purges (i.e. removes) files from /work that are over 6 months old. You may run

    hcc-purge

    to see which files (if any) are old enough to be removed at the next purging at any time. Purges will be performed each week on an ongoing basis moving forward. Scans for files to purge will be done twice a week. Please see

    https://hcc-docs.unl.edu/display/HCCDOC/Handling+Data

    for further details.

    All traces of the originally purged files will be removed this week. The initial purging was simply a move to a quarantine area as a safeguard since this is a newly implemented policy, but the new location also was on /work. The /work file system is over 90% full, and the 18 Million files totaling over 81 TB of space will be removed. All of these files had not been accessed in over 6 months. This should help the overall performance of the system and will avoid an overrun of the /work filesystem. If you need a place to store data longer term, Attic space is now available for $60/TB/year (and is stored redundantly both in Lincoln and Omaha).

    Best regards,
    David Swanson


  • 2016-09-22:  Tusker: /work filesystem performance issues resolved
    Category:  General Announcement

    The Tusker /work filesystem is running normally. A reboot of the Lustre metadata server (MDS) was required.


    The Tusker /work filesystem is running normally. A reboot of the Lustre metadata server (MDS) was required.

    The issue on the MDS looked very similar to the performance degradation experienced 2016-09-07. A different process was used to recover the MDS and the logs indicate that no Lustre clients were evicted. The Slurm logs show jobs that ended during the MDS recovery window did so successfully or ran into job time limits.

    These are promising log indications that running jobs blocked when interacting with /work and recovered when the MDS returned to service.

    We still advise to check your job state if you had running jobs on Tusker as the degraded performance of /work may have impacted jobs that ran into time limits.

    Please contact us at hcc-support@unl.edu with any questions.

  • 2016-09-22:  Tusker: /work filesystem performance issues
    Category:  System Failure

    We are investigating performance issues with the Tusker /work filesystem. There may be disruption to running jobs as we work to correct the issue. We will follow up with details when the maintenance is complete.


    We are investigating performance issues with the Tusker /work Lustre filesystem. Around 5:00am on Thursday, performance of the Tusker /work filesystem dropped significantly.

    To correct the problem, we are restarting Lustre services. This may disrupt running processes which are reading or writing files on /work.

    We will follow up with more details when the maintenance is complete.

  • 2016-07-21:  Draft: file removal from /work on all HCC machines
    Category:  General Announcement

    This notice concerns a policy that affects all HCC machines and potentially all HCC users.
    SUMMARY:
    HCC is implementing a new automated file purge policy on the /work filesystem for all HCC machines. Starting August 1, 2016 we will remove any files on /work which have not been accessed for at least 6 months. This will not affect the /home filesystems or the Attic storage system.


    EXPLANATION:
    The /work filesystem exists on each HCC machine for working files. It is not designed, or intended, for long term storage. The /work filesystem periodically is filled near capacity and this requires files to be deleted to keep the system as a whole available for ongoing use. To date, we have used a somewhat manual process of warning the user community and relying upon voluntary file removal. This is no longer sufficient due to the number of users and the number of accumulated files (e.g. Tusker is currently precariously close to going off-line due to /work being filled). The prior method will be augmented going forward with the automatic removal of all files that have not been accessed for over 6 months. Artificial activity to circumvent this policy will be considered misuse of the system. Longer term file storage is offered by HCC on Attic for an annual fee. This year, that fee has dropped from $100/TB/year to $60/TB/year.

  • 2016-09-09:  SANDHILLS available after power outage
    Category:  System Failure

    Partial power outage in SANDHILLS resolved


    A weather-related power outage around 3:30pm caused worker nodes in SANDHILLS to become unavailable. Power is restored and SANDHILLS is fully operational. Running jobs were killed because of the outage but we think that no files were impacted. Please send an email to hcc-support@unl.edu if you find any problems.

Pages