- How do I gain access to Rivanna?
- How do I log on to Rivanna?
- How do I reset my current password / obtain a new password?
- How do I check my allocation status on Rivanna?
- How do I check the general usage of my allocations?
- How do I check historic usage of my allocations?
- How do I add or remove people from my allocations?
- How do I use research software that's already installed?
- Does ARCS install research software?
- Is there any other way to install research software that I need?
- How do I get OpenMPI to work?
- How do I submit jobs?
- What queues can I use?
- How do I choose which queue to use?
- How do I check the status of my jobs?
- Why is my job not starting?
- Why can't I submit jobs anymore?
- How do I check the efficiency of my completed jobs?
- How do I get leased storage?
- How do I check my /scratch usage on Rivanna?
- How do I check how much leased or home storage I am using on Rivanna?
Please read and follow these instructions.
Access to the HPC cluster requires a valid Eservices password. Your Netbadge password is not necessarily the same thing, so if you are unable to log in, you should first try resetting your Eservices password here. If the problem persists, contact ITS (which manages all Eservices accounts) through the group's online help desk.
In all cases you will need to use an account with remaining service units in order to submit jobs.
Run allocations -v:
Run allocations with -h
You must use the MyGroups interface to do this, and you must have administrative access to the group.
Please read this.
ARCS will install software onto Rivanna if it is of wide applicability to the user community. Software used by one group should be installed by the group members, ideally onto leased storage for the group. We can provide limited assistance for individual installations. We also encourage users to write their own modules for individually-installed software, although this is not required.
For help installing research software on your PC, please contact Research Software Support at firstname.lastname@example.org.
We have given all LSPs access to an area on the Lustre Filesystem which they can use for adding software that we don't have in /share/apps, and have not yet had the opportunity to provide for you. Please contact your LSP, if they're not sure what you're talking about, please ask them to contact us for more information.
If you're not using the economy queue, include these options to your mpiexec or mpirun command: -mca btl sm,openib,self
If you're using the economy queue, include these options to your mpiexec or mpirun command: --mca btl sm,tcp,self --mca btl_tcp_if_include eth1
The command you want is probably sbatch; please read this.
After logging in, run the command queues to see what queues you might have access to, for example:
Note: you may not see the same as the above when you run the command.
Run the command queues, based on the Time-Limit, Maximum Cores/Job, SU Rate, and Usable Accounts values, please pick a queue that best suits the needs of your research. If you do not have a large allocation, and do not need any storage, then you may want to use the economy queue (note: economy queue nodes do not have access to /scratch).
If reporting a problem to us about a particular job, please let us know the JobID for the job that you are having a problem with. You can also run jobq -l to relate particular jobs to specific submission scripts:
Several things can cause jobs to wait in the queue. If you request a resource combination we do not have, such as 20 cores on an economy node, the queueing system will not recognize that this condition will not be met and will leave the job pending (PD). You may also have run a large number of jobs in the recent past and the "fair share" algorithm is allowing other users higher priority. Finally, the queue you requested may simply be very busy.
Why do my jobs get killed?
Usually this is because you inadvertently submitted the job to run in a location that the compute nodes can't access or is temporarily unavailable. If your jobs exit immediately this is usually why. Another common reason is by using too much memory or running past a job's timelimit.
You can run sacct:
[aam2y@udc-ba36-27:/root] sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 159637 ompi_char+ parallel hpc_admin 80 COMPLETED 0:0 159637.batch batch hpc_admin 1 COMPLETED 0:0 159637.0 orted hpc_admin 3 COMPLETED 0:0 159638 ompi_char+ parallel hpc_admin 400 TIMEOUT 0:1 159638.batch batch hpc_admin 1 CANCELLED 0:15 159638.0 orted hpc_admin 19 CANCELLED 255:126
zif it's still not clear why your job was killed, please contact us and send us the output from sacct.
You must not be overallocated with your /scratch usage and you must have some remaining service units in order to submit jobs. Please check the output of sfsq and/or allocations to determine what the problem is.
If your rating is low, please contact us: we can help.
You can lease Enterprise storage from here. You can also lease value storage by contacting us
If you have used up too much space, created too many files, or have "old" files you may be regarded as "overallocated". Please note that if you are overallocated, you won't be able to submit any new jobs until you clean up your /scratch folder.
To check your home space, run quota -s:
bash-4.1$ quota -s Disk quotas for user jm9yq (uid 650224): Filesystem blocks quota limit grace files quota limit grace 10.243.122.179:/home 395M 0 4096M 0 0 0
To check your leased space, change directory in your your leased space and then run df -h /nv/volX, where volX is your leased storage:
bash-4.1$ cd /nv/vol89 bash-4.1$ df -h /nv/vol89 Filesystem Size Used Avail Use% Mounted on nas19-s.itc.virginia.edu:/export/vol89 247G 225G 22G 92% /nv/vol89