Frequently Asked Questions

General Usage

How do I gain access to Rivanna?

Please read and follow these instructions.

How do I log on to Rivanna?

Use an SSH client from a campus-connected machine (VPN in first, if you are off Grounds), and connect to interactive.hpc.virginia.edu. You can also use FastX.

How do I reset my current password / obtain a new password?

Access to the HPC cluster requires a valid Eservices password. Your Netbadge password is not necessarily the same thing, so if you are unable to log in, you should first try resetting your Eservices password here. If the problem persists, contact ITS (which manages all Eservices accounts) through the group's online help desk.

Job Management

How do I submit jobs?

The command you want is probably sbatch; please read this.

What queues can I use?

After logging in, run the command queues to see what queues you might have access to, for example:

Note: you may not see the same as the above when you run the command.

How do I choose which queue to use?

Run the command queues,  based on the Time-Limit, Maximum Cores/Job, SU Rate, and Usable Accounts values, please pick a queue that best suits the needs of your research. If you do not have a large allocation, and do not need any storage, then you may want to use the economy queue (note: economy queue nodes do not have access to /scratch).

How do I check the status of my jobs?

Run jobq:

If reporting a problem to us about a particular job, please let us know the JobID for the job that you are having a problem with. You can also run jobq -l to relate particular jobs to specific submission scripts:

How do I check the efficiency of my completed jobs?

Run jobe:

jobe.png

If your rating is low, please contact us: we can help.

Why do my jobs get killed?

Usually this is because you inadvertently submitted the job to run in a location that the compute nodes can't access (like asking an economy queue job to use your /scratch directory for job IO) -- if your jobs die right away this is usually why. Another common reason is by using too much memory or running past a job's timelimit. 

You can run sacct:

[aam2y@udc-ba36-27:/root] sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
159637       ompi_char+   parallel  hpc_admin         80  COMPLETED      0:0 
159637.batch      batch             hpc_admin          1  COMPLETED      0:0 
159637.0          orted             hpc_admin          3  COMPLETED      0:0 
159638       ompi_char+   parallel  hpc_admin        400    TIMEOUT      0:1 
159638.batch      batch             hpc_admin          1  CANCELLED     0:15 
159638.0          orted             hpc_admin         19  CANCELLED  255:126

... if it's still not clear why your job was killed, please contact us and send us the output from sacct.

Why can't I submit jobs anymore?

You must not be overallocated with your /scratch usage and you must have some remaining service units in order to submit jobs. Please check the output of sfsq and/or allocations to determine what the problem is.

Storage Management

How do I get leased storage?

You buy some from here. Rivanna probably already knows how to mount it, but if you encounter any problems accessing your leased storage please contact us.

How do I check my /scratch usage on Rivanna?

Run sfsq:

If you have used up too much space, created too many files, or have "old" files you may be regarded as "overallocated". Please note that if you are overallocated, you won't be able to submit any new jobs until you clean up your /scratch folder.

How do I check my leased-or-home storage usage on Rivanna?

To check your home space, run quota -s:

bash-4.1$ quota -s
Disk quotas for user jm9yq (uid 650224): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
10.243.122.179:/home
                   395M       0   4096M               0       0       0 

To check your leased space, change directory in your your leased space and then run df -h /nv/volX, where volX is your leased storage:

bash-4.1$ cd /nv/vol89
bash-4.1$ df -h /nv/vol89
Filesystem            Size  Used Avail Use% Mounted on
nas19-s.itc.virginia.edu:/export/vol89
                      247G  225G   22G  92% /nv/vol89

Allocations

How do I check my allocation status on Rivanna?

Run allocations:

In all cases you will need to use an account with remaining service units in order to submit jobs.

How do I check the general usage of my allocations?

Run allocations -v:

How do I check historic usage of my allocations?

Run allocations with -h

How do I add or remove people from my allocations?

You must use the MyGroups interface to do this, and you must have administrative access to the group.

Applications

How do I use research software that's already installed?

Please read this.

Does ARCS install research software?

ARCS will install software onto Rivanna if it is of wide applicability to the user community. Software used by one group should be installed by the group members, ideally onto leased storage for the group.  We can provide limited assistance for individual installations. We also encourage users to write their own modules for individually-installed software, although this is not required.

For help installing research software on your PC, please contact Research Software Support at res-consult@virginia.edu.

Is there any other way to install research software that I need?

We have given all LSPs access to an area on the Lustre Filesystem which they can use for adding software that we don't have in /share/apps, and have not yet had the opportunity to provide for you. Please contact your LSP, if they're not sure what you're talking about, please ask them to contact us for more information.

How do I get OpenMPI to work?

If you're not using the economy queue, include these options to your mpiexec or mpirun command: -mca btl sm,openib,self

If you're using the economy queue, include these options to your mpiexec or mpirun command: --mca btl sm,tcp,self --mca btl_tcp_if_include eth1