Rivanna

Getting started on Rivanna

This guide is designed for researchers who are new to the UVA HPC System. Throughout this guide, we shall use the placeholder mst3k to represent the user’s login ID. The user should substitute his/her own login ID for mst3k. Specifications on the current HPC system can be found here.

Overview

Rivanna provides a high-performance computing environment for all user levels. A majority of Rivanna’s nodes are Cray Cluster Solutions nodes connected by FDR (fourteen data rate) Infiniband, but there are also two nodes with NVIDIA Kepler K20 GPUs, several nodes with QDR (quad data rate) Infiniband, and quite a few older nodes connected with gigabit ethernet.

All nodes share a Lustre filesystem for temporary storage called /scratch with up to 1.4PB of storage space for all users.   Each user is assigned space in /scratch/$USER with a default quota of 10TB of storage. 

Accessing the System

Time on Rivanna is allocated as Service Units (SUs).  One SU corresponds to one core-hour.  Allocations are managed through MyGroups accounts.   The group owner is the Principle Investigator (PI) of the allocation.  Faculty, staff, and postdoctoral associates are eligible to be PIs. Students—both graduate and undergraduate—must be members of an allocation group sponsored by a PI.  Each PI is ultimately responsible for managing the roster of users in the group although PIs may delegate day-to-day management to one or more other members.  When users are added or deleted, accounts are automatically created or purged at the next system update.

Trial allocations of 5,000 SUs are available on request.  Larger allocations may be requested through administrative grants or may be purchased through a PTAO.

Logging In

The system is accessed through ssh (Secure Shell) connections using the hostname rivanna.hpc.virginia.edu.  Windows users must install an ssh client such as SecureCRT or PuTTY.  For Windows users we recommend MobaXterm. Mac OSX and Unix users may connect through a terminal using the command ssh mst3k@rivanna.hpc.virginia.edu.  Users working from off Grounds must run the UVA Anywhere VPN client.

Users who wish to run X11 graphical applications may prefer the FastX remote desktop client.

Software Access

The Modules Environment

User-level software is installed into a shared directory /share/apps.  The modules software enables users to manage their environments to access specific software, or even specific versions of the software.  The most commonly used commands include:

  • module avail (prints a list of all software packages available through a module)
  • module avail <package> (prints a list of all versions available for <package>)
  • module load <package> (loads the default version of <package>)
  • module load <package>/<version> (loads the specific <version> of <package>)
  • module unload <package> (removes <package> from the current environment)
  • module purge (removes all loaded modules from the environment)
  • module list (prints a list of modules loaded in the user’s current environment)

For more details about modules see the documentation.

Software Requests

Software accessed through modules is available for all users. Users may install their own software to their home directory or to shared leased space provided they are legally permitted to do so, either because it is open source or because they have obtained their own license. User-installed software may not require root privileges to install or operate under any circumstances.  User software may run daemons (services) provided that those services do not interfere with other users.

Users may petition ARCS to install software into the common directories. Each request will be considered on an individual basis and may be granted if it is determined that the software will be of wide interest. In other cases ARCS may help users install software into their own space.

Running Jobs

Submitting Jobs to the Compute Nodes

Rivanna resources are managed by the SLURM workload manager. The login rivanna.hpc.virginia.edu consists of multiple dedicated servers but their use is restricted to editing, compiling, and running very short test processes.  All other work must be submitted to SLURM to be scheduled onto a compute node. 

SLURM divides the system into partitions which provide different combinations of resource limits, including wallclock time, aggregate cores for all running jobs, and charging rates against the SU allocation. There is no default and users must choose a partition in each script. 

Please see the queues table above for more information about the queues, including configured limits and other restrictions.

Users may run the command queues to determine which partitions are enabled for them.  This command will also show the limitations in effect on each queue.

Users may run the command allocations to view the allocation groups to which they belong and to check their balances.

High-Performance Queues

Jobs submitted to these partitions are charged against the group’s allocation.

  • parallel: jobs that can take advantage of the InfiniBand interconnect. 
  • request: like parallel but users may access all high-performance cores.  Limited to intervals following maintenance.
  • largemem: jobs that require more than one core’s worth of memory per core requested.
  • serial: single-core jobs that need higher-speed access to temporary storage.
  • development: short debugging runs
  • gpu: access to two Kepler-equipped nodes for testing general-purpose GPU (GPGPU) codes.

Economy Queue

This partition consists of older nodes with ethernet only.  Jobs submitted to the economy partition are charged at a reduced rate. 

Job Management

SLURM jobs are shell scripts consisting of a preamble of directives or pseudocomments that specify the resource requests and other information for the scheduler, followed by the commands required to load any required modules and run the user’s program. Directives begin with the “pseudocomment” #SBATCH followed by options. Most SLURM options have two forms; a shorter form consisting of a single letter preceded by a single hyphen and followed by a space, and a longer form preceded by a double hyphen and followed by an equal sign (=). In SLURM a “task” corresponds to a process; therefore threaded applications should request one task and specify the number of cpus (cores) per task.

Common SLURM Options:

Number of nodes requested:

#SBATCH -N <N>
#SBATCH --nodes=<N>

Number of tasks per node:

#SBATCH --ntasks-per-node=<n>

Total tasks (processes) distributed across nodes by the scheduler:

#SBATCH -n <n>
#SBATCH --ntasks=<n>

Number of tasks per core:

#SBATCH --ntasks-per-core=<n>

Wallclock time requested:

#SBATCH –t d-hh:mm:ss
#SBATCH --time=d-hh:mm:ss

Memory request in megabytes over each node (the default is 1000 (1GB)):

#SBATCH --mem=<M>

Memory request in megabytes per core (may not be used with --mem):

#SBATCH --mem-per-cpu=<M>

Request partition <part>:

#SBATCH –p <part>
#SBATCH --partition=<part>

Specify the account to be charged for the job (this should be present even for economy jobs; the account name is the name of the MyGroups allocation group to be used for the specified run):

#SBATCH –A <account>
#SBATCH --account=<account>
Example Serial Job Script:
#!/bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -t 12:00:00
#SBATCH -p serial
#SBATCH -A mygroup

# Run program
./myprog myoptions
Example Parallel Job Script:
#!/bin/bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=4
#SBATCH -t 12:00:00
#SBATCH -p parallel
#SBATCH -A mygroup

# Run parallel program over Infiniband using MVAPICH2

module load mvapich2/intel
mpirun -launcher slurm ./xhpl > xhpl_out
Submitting a Job and Checking Status

Once the job script has been prepared it is submitted with the sbatch command:

sbatch myscript.slurm

The scheduler returns the job ID, which is how the system references the job subsequently.

Submitted batch job 36598

To check the status of the job, the user may type

squeue –u mst3k

Status is indicated with PD for pending, R for running, and CG for exiting.

By default SLURM saves both standard output and standard error into a file called slurm-<jobid>.out.  This file is created in the submit directory and is appended during the run.

Canceling a Job

Queued or running jobs may be canceled with

scancel <jobid>

Note that user-canceled jobs are charged for the time used when applicable.

Usage policies

Allocations

Any eligible Principle Investigator may request a Trial allocation of 5,000 SUs.  Under certain circumstances a supplemental allocation known as a "Standard" allocation of an additional 5,000 SUs may be granted. If the PI is an affiliate of the College of Arts and Sciences or the School of Engineering and Applied Science, the PI can file a short proposal to request a larger Administrative allocation.  PIs affiliated with other units should submit allocation requests to the Data Sciences Institute. Time can also be purchased through external funding at a rate determined by the HPC Steering Committee.  Trial, Standard, and Administrative allocation grants are for one year and must be renewed.  Purchased time does not expire during the active interval of the grant.

PIs may request only one Trial allocation per year but may extend that group with an additional allocation received for the projects.  PIs who must keep projects separated, such as to distinguish those funded externally from those granted internally, may have more than one allocation group. 

If a group exhausts its allocation, all members of the group will have access only to the economy queue.  If an individual user exceeds the /scratch filesystem limitations, only that user will be blocked from submitting new jobs on any partition.

Frontends

Exceeding the limits on the frontend will result in the user’s process(es) being killed. Repeated violations will result in a warning; users who ignore warnings risk losing access privileges. 

Software Licenses

Excessive consumption of licenses for commercial software, either in time or number, if determined by system and/or ARCS staff to be interfering with other users' fair use of the software, will subject the violator's processes or jobs to termination without warning.  Staff will attempt to issue a warning before terminating processes or jobs but inadequate response from the violator will not be grounds for permitting the processes/jobs to continue.

Inappropriate Usage

Any violation of the University’s security policies, or any behavior that is considered criminal in nature or a legal threat to the University, will result in the immediate termination of access privileges without warning.

Compute node specifications

Qty Processor Family Base Micro architecture Cores Per Node GB RAM Per Node MHz Processor Speed Memory Speed Internal Network (gbps) UVA Network (gpbs) Queue Assignment(s)
4 Intel Ivy Bridge EP Sandy Bridge 20 64 2,500 DDR3-1866 56 10 (direct) none (interactive nodes)
240 Intel Ivy Bridge EP Sandy Bridge 20 128 2,500 DDR3-1866 56 10 (routed) serial and parallel
2 Intel Sandy Bridge-EP Sandy Bridge 16 256 2,600 DDR3-1600 40 10 (routed) gpu
4 Intel Haswell-EP Sandy Bridge 16 1,024 2,600 DDR4-1866 56 10 (routed) largemem
11 Intel Sandy Bridge-EP Sandy Bridge 16 128 2,700 DDR3-1600 1 1 (routed) economy
4 Intel Westmere-EP Nehalem 8 48 2,400 DDR3-1066 1 1 (routed) economy
22 Intel Westmere-EP Nehalem 12 96 2,530 DDR3-1333 1 1 (routed) economy
7 Intel Westmere-EP Nehalem 12 96 2,670 DDR3-1333 1 1 (routed) economy
58 Gainestown Nehalem 8 48 2,670 DDR3-1333 1 1 (routed) economy
11 AMD Magny-Cours

Bulldozer

16 16 2,000 DDR3-1333 1 1 (routed) economy
 
8 Intel Phi Knights Landing 64 (256) 208 1,300 DDR4-1200 56 10 (routed) development