Please read and follow these instructions.
Use an SSH client from a campus-connected machine (VPN in first, if you are off Grounds), and connect to interactive.hpc.virginia.edu. You can also use FastX.
Access to the HPC cluster requires a valid Eservices password. Your Netbadge password is not necessarily the same thing, so if you are unable to log in, you should first try resetting your Eservices password here. If the problem persists, contact ITS (which manages all Eservices accounts) through the group's online help desk.
The command you want is probably sbatch; please read this.
After logging in, run the command queues to see what queues you might have access to, for example:
Note: you may not see the same as the above when you run the command.
Run the command queues, based on the Time-Limit, Maximum Cores/Job, SU Rate, and Usable Accounts values, please pick a queue that best suits the needs of your research. If you do not have a large allocation, and do not need any storage, then you may want to use the economy queue (note: economy queue nodes do not have access to /scratch).
If reporting a problem to us about a particular job, please let us know the JobID for the job that you are having a problem with. You can also run jobq -l to relate particular jobs to specific submission scripts:
If your rating is low, please contact us: we can help.
Usually this is because you inadvertently submitted the job to run in a location that the compute nodes can't access (like asking an economy queue job to use your /scratch directory for job IO) -- if your jobs die right away this is usually why. Another common reason is by using too much memory or running past a job's timelimit.
You can run sacct:
[aam2y@udc-ba36-27:/root] sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 159637 ompi_char+ parallel hpc_admin 80 COMPLETED 0:0 159637.batch batch hpc_admin 1 COMPLETED 0:0 159637.0 orted hpc_admin 3 COMPLETED 0:0 159638 ompi_char+ parallel hpc_admin 400 TIMEOUT 0:1 159638.batch batch hpc_admin 1 CANCELLED 0:15 159638.0 orted hpc_admin 19 CANCELLED 255:126
... if it's still not clear why your job was killed, please contact us and send us the output from sacct.
You must not be overallocated with your /scratch usage and you must have some remaining service units in order to submit jobs. Please check the output of sfsq and/or allocations to determine what the problem is.
You buy some from here. Rivanna probably already knows how to mount it, but if you encounter any problems accessing your leased storage please contact us.
If you have used up too much space, created too many files, or have "old" files you may be regarded as "overallocated". Please note that if you are overallocated, you won't be able to submit any new jobs until you clean up your /scratch folder.
To check your home space, run quota -s:
bash-4.1$ quota -s Disk quotas for user jm9yq (uid 650224): Filesystem blocks quota limit grace files quota limit grace 10.243.122.179:/home 395M 0 4096M 0 0 0
To check your leased space, change directory in your your leased space and then run df -h /nv/volX, where volX is your leased storage:
bash-4.1$ cd /nv/vol89 bash-4.1$ df -h /nv/vol89 Filesystem Size Used Avail Use% Mounted on nas19-s.itc.virginia.edu:/export/vol89 247G 225G 22G 92% /nv/vol89
In all cases you will need to use an account with remaining service units in order to submit jobs.
Run allocations -v:
Run allocations with -h
You must use the MyGroups interface to do this, and you must have administrative access to the group.
Please read this.
ARCS will install software onto Rivanna if it is of wide applicability to the user community. Software used by one group should be installed by the group members, ideally onto leased storage for the group. We can provide limited assistance for individual installations. We also encourage users to write their own modules for individually-installed software, although this is not required.
For help installing research software on your PC, please contact Research Software Support at email@example.com.
We have given all LSPs access to an area on the Lustre Filesystem which they can use for adding software that we don't have in /share/apps, and have not yet had the opportunity to provide for you. Please contact your LSP, if they're not sure what you're talking about, please ask them to contact us for more information.
If you're not using the economy queue, include these options to your mpiexec or mpirun command: -mca btl sm,openib,self
If you're using the economy queue, include these options to your mpiexec or mpirun command: --mca btl sm,tcp,self --mca btl_tcp_if_include eth1