Ursa User Guide
Ursa System Overview
Ursa is located at the NOAA Environmental Security Computing Center (NESCC), in Fairmont, West Virginia.
Getting Started with Ursa
Log into your NOAA Google account, and you can view a slide presentation introducing Ursa. Log into your NOAA Google account, and you can also review a recorded version of the presentation.
Ursa System Configuration
Compute System |
GPU System |
|
|---|---|---|
CPU Type |
AMD Genoa 9654: 96 cores/12 channels to memory |
AMD Genoa 9654: 96 cores/12 channels to memory |
CPU Speed (GHz) |
2.4 GHz |
2.4 GHz |
Compute Nodes |
576 |
58 |
Cores/Node |
192 |
192 |
Total Cores |
110,592 |
11,136 |
Memory/Core (GB) |
2 |
2 |
OS |
Rocky 9 |
Rocky 9 |
CPU Peak Performance |
4.25 PFlops |
0.43 PFlops |
Interconnect |
NDR-200 IB |
NDR-200 IB |
Total Disk Capacity |
>100 PB, (shared with Hera) |
>100 PB, (shared with Hera) |
Total Ave Disk Performance |
>1000 GB/s |
>1000 GB/s |
GPU Type |
N/A |
NVIDIA H100-NVL |
GPU’s/node |
N/A |
2 |
Memory/GPU (GB) |
N/A |
94 |
Total GPU FLOPS (TFLOPS) |
N/A |
3.48 PFlops |
Ursa Partitions
Partition |
QOS Allowed |
Billing TRES Factor |
Description |
|---|---|---|---|
u1-compute |
batch,windfall, debug, urgent, long |
100 |
General compute resource. Default if no partition is specified. |
u1-h100 |
gpu, gpuwf |
100 |
For jobs that require nodes with the Nvidia H-100 GPUs. |
u1-gh |
gpuwf |
100 |
For jobs that require nodes with the Nvidia Grace-Hopper processors. |
u1-mi300x |
gpuwf |
100 |
For jobs that require nodes with the AMD MI300X GPUs. |
u1-service |
batch, windfall |
100 |
Serial jobs (max 64 cores and/or 250 GB of memory per user), with a 24 hr wall time limit. Jobs will be run on service nodes that have external network connectivity. Useful for data transfers or access to external resources like databases. If your workflow requires pushing or pulling data to/from the HSMS(HPSS), it should be run there. This partition is also a good choice for doing your compilations and builds of your applications and libraries rather than doing it on the login nodes. |
See the Quality of Service (QOS) table for more information.
Ursa Node Sharing
Jobs requesting less than 192 cores or the equivalent amount of memory will share the node with other jobs.
With the Ursa u1-compute partition:
If you request 1-191 cores for your job you will be allocated and charged for the greater of the number of cores requested or the amount of memory requested in GB divided by 2.
If you request 192 or greater cores you will be given and charged for whole nodes, in multiples of 192 cores. (ex. Request - 193, charged for 384 cores)
Ursa Front Ends and Service Partition
Ursa has 15 outward-facing nodes.
- 4 nodes will be (front-end) login/cron nodes interactive use:
ufe01-ufe04, total of 768 cores for interactive use. See the Login (Front End) Node Usage Policy for important information about using Login nodes.
- 10 nodes will comprise the service partition:
3,840 cores total.
Available via Slurm.
Target for compilation and data transfer jobs.
Target for scrontab jobs (Scrontab is preferred for recurring jobs).
- 1 node is available for ecflow
uecflow01
Using GPU Resources on Ursa
Ursa has one production GPU partition (u1-h100) and two
exploratory GPU
partitions (u1-gh and u1-mi300x). See the table above for
details.
Partition u1-h100: There are 58 nodes, each with 192 AMD cpu
cores, 2 NVIDIA H100 GPUs with 94GB of memory/GPU. This partition is
accessible from the gpu and gpuwf QOS’s. For billing and accounting:
one H100 gpu-hour will count as 192/2=96 cpu core-hours.
In order to have priority access to the H100 GPU resources you will need a GPU specific project allocation. Please contact your PI or Portfolio Manager to get a GPU specific allocation.
All projects with a CPU project allocation on Ursa have windfall access to the H100 GPU resources, and conversely all users with GPU specific project allocations have windfall access to the non-GPU resources.
Using H100 GPU Resources With a GPU allocation
If you have a H100 GPU specific project allocation on Ursa, you can access
the H100 GPUs by submitting to the u1-h100 partition and gpu QOS as
shown in the example below where 3 H100 GPUs are being requested:
sbatch -A mygpu_project -p u1-h100 -q gpu --gpus=h100:3 my_ml.job
Using H100 GPU Resources Without a GPU allocation
Users that do not have H100 GPU specific project allocations on Ursa
can access the H100 GPU resources at windfall priority. This means
that users can submit jobs to the system, but they will only run
when the resources are not being used by projects that do have a H100
GPU specific project allocation. This is helpful for users who are
interested in exploring the GPU resources for their applications.
To use the system in this mode please submit the jobs to the u1-h100
partition and gpuwf QOS as shown in the example below where 2 H100
GPUs are being requested:
sbatch -A mycpu_project -p u1-h100 -q gpuwf --gpus=h100:2 my_ml.job
Using the Exploratory GPU Resources
In addition to the NVIDIA H100 GPU system (Partition: u1-h100), two new
small GPU exploratory systems (partitions) with the newer GPU types are
available for experimentation. These systems are connected to the Ursa
IB network and have access to the Ursa file systems. There are no
allocations or fairshare priority for these partitions; therefore all
projects with access to Ursa have equal access (first come first served)
to these partitions via the gpuwf QOS.
To access these nodes, login to Ursa and submit an interactive
batch job requesting these GPU resources. Once you have an interactive
shell, you can compile and run your applications on those nodes.
Please keep in mind that the CPUs on the exploratory GPU resources
are different from the CPUs in the production system.
Vendor provided software is available by loading the appropriate
modules. Please run the module spider command to see the list
of modules available.
Description of the two exploratory systems:
Partition:
u1-gh: There are 8 Grace Hopper nodes, each with one NVIDIA GH200 Grace Hopper Superchip with NVIDIA software. The CPU part of this superchip is an ARM processor with 72 cpu cores and approximately 213 GB of RAM. Click the NVIDIA GH200 for more detailed information. For billing and accounting: one grace hopper gpu-hour will count as 72 cpu core-hours.Partition:
u1-mi300x. There are 3 nodes, each with 96 Intel cpu cores, 8 AMD Mi300x APUs each with 192 GB of RAM AMD ROCm software. Click AMD MI300X for more detailed information. For billing and accounting: one mi300x gpu-hour will count as 96/8=12 cpu core-hours.
Run one of the following commands to get interactive access to these nodes:
salloc -A mygpu_project -t 480 -p u1-gh -q gpuwf --gpus=gh200:1
salloc -A mygpu_project -t 480 -p u1-mi300x -q gpuwf --gpus=mi300x:2
In the examples above, the first example requests one node with one GH200 GPU and the second example requests one node with two MI300X GPUs.
Ursa Software Stack
Ursa OS is Rocky 9.4, similar to MSU systems (Rocky 9.1) whereas Hera/Jet are Rocky 8.
Module layout is more akin to what you see on MSU systems; installed using spack.
Please run the
module spidercommand to see all the available modules!
Compilers: Intel’s oneapi, Nvidia’s nvhpc, and AMD’s AOCC compilers are available.
MPIs: Intel MPI from Intel, HPC-X MPI from Nvidia, and openMPI implementations are available.
We have seen much better performance and stability with HPC-X in our testing of communication intensive benchmarks as it is optimized to take advantage of the NDR-200 IB network more effectively.
An Intel stack is in place. Other stacks will be considered if requested.
Ursa HPFS File Systems
Ursa has the following three High Performance File Systems (HPFS) available:
/scratch[3,4,5].
Note
While the /scratch[3,4] file systems are shared between Ursa
and Hera, /scratch5 is only available on Ursa.
Caution
Please note that the HPFS scratch file systems are NOT backed up!
The /scratch[3,4] file systems are Lustre file systems with project
based disk space quotas for routine work.
The /scratch5 file system is a new VAST file system, which offers
different technology from the /scratch[3,4] Lustre file systems.
This is an all-flash filesystem designed to perform well for a variety
of workloads and files of varying size.
The VAST file system is significantly more expensive per PB than
the Lustre file systems, and we currently do not know what
the performance implications of this file system are as opposed
to the Lustre file system on your applications.
Therefore, currently only two
projects, rstprod and public, have project based quotas
on the VAST file system. However, all other Ursa projects and
users may explore the suitability of this new file system
for your applications via the purged directory,
/scratch5/purged.
Warning
This directory will be purged of all files that have not
been accessed in the past 90 days. Depending on usage we
will adjust the purge schedule as needed, preceded by a user
notification. Users under the /purged directory have a quota
of 250 TB.
If you want to use /scratch5, create and use a single
sub-directory under the /purged directory
(/scratch5/purged/$USER), with the directory name the
same as the First.Last of your NOAA email.
Since /scratch5 is a new and different technology from our
previous Lustre file systems, the RDHPCS program would appreciate
insight into your experiences with it. In particular, we would
like to know how performance has been affected (both positively
or negatively) and how stable and consistent the file system
is for your applications. Please send this information via
an RDHPCS Ursa help ticket with the subject of /scratch5
performance results.
If the performance of your application suite is significantly
improved using the VAST File system vs the Lustre file systems
and you would like your project to have non-purged quota project
space on /scratch5, please have your Portfolio Manager
submit a request via a RDHPCS Ursa Help ticket. Include the amount
of disk space you require and a detailed justification, including
a performance comparison for your application suite between
/scratch[3,4] and /scratch5.
Usage/Quota information for /scratch[3,4] file systems
The new file systems /scratch[3,4] on
Ursa and Hera have a performance improving feature called
“Hot Pools”. With Hot Pools, the first 1 GB of each file is written
to the fast SSD (hot) tier, by default. After some time, usually
10 to 15 minutes, the file is mirrored to the slower HDD (cold) tier
and will be double counted as usage toward your quota. As long as the
file is actively used, it will stay on both tiers (hot and cold). Unused
files are removed from the hot tier and reside only on the cold tier.
As a result the reported usage for the first 1 GB of active files may be doubled.
Cron and Scrontab Services
On Ursa both cron and scrontab services are available.
We strongly recommend using scrontab instead of cron
whenever possible. For information on how to use scrontab
please see scrontab.
Getting Help
For any Ursa or Rhea issue, open a help request.