Jet User Guide
Currently Jet consists of six compute partitions, plus four bigmem nodes, totaling 57,744 coes, @ 1.884 PF.
Partition |
sJet |
vJet |
xJet |
bigmem |
kJet |
---|---|---|---|---|---|
Installation Year |
2012 |
2014 |
2015 and 2016 |
2015 |
2018 |
CPU Type |
Intel SandyBridge |
Intel IvyBridge |
Intel Haswell |
Intel Haswell |
Intel Skylake |
CPU Model Number |
X5-2670 |
E5-2 650v |
E5-2 670v3 |
E5-26 70v3 |
6148 |
CPU Speed (GHz) |
2.6 |
2.6 |
2.3 |
2.3 |
2.4 |
Total Nodes |
330 |
288 |
812 |
4 |
404 |
Cores/Node |
16 |
16 |
24 |
24 |
40 |
Total Cores |
5,120 |
4,608 |
19,488 |
96 |
14,400 |
Memory/Node (GB) |
32 |
64 |
64 |
256 |
96 |
Memory/Core (GB) |
2.0 |
4.0 |
2.66 |
10.6 |
2.4 |
Available Memory/Node (GB) |
29 |
61 |
61 |
253 |
93 |
Interconnect |
QDR Infiniband |
FDR Infiniband |
FDR Infiniband |
FDR Infiniband |
EDR Infiniband |
Relative Perf/Core (to legacy tJet/uJet) |
1.44 |
1.65 |
1.5 |
1.5 |
1.68 |
Peak FlOPS/Node (GFlOPS) |
332.8 |
332.8 |
883 |
883 |
2,048 |
Total FlOPS/Node (TFlOPS) |
113.2 |
93.6 |
717.4 |
3.5 |
827 |
Note
Jet’s Front Ends (service partition) have the same architecture as the xJet compute nodes.
Total FlOPS is a theoretical peak and does not represent actual performance.
Relative performance is based on SPEC CPU 2017 (specifically SPECrate 2017 Floating Point) benchmark. It is normalized by the slowest core in production.
Available Memory/Node is the total memory available toapplication. The difference between this value and the total available memory is due to OS overhead and other system buffers.
System Features:
Total of 55,984 cores of 64-bit Intel CPU’s,
Capability of 1,795 trillion floating point operations per second – or 1.79 petaflops,
Total scratch disk capacity of 6.6 Petabytes
Name |
Type |
Size |
lfs4 |
HPE Lustre |
4500 TB |
lfs5 |
DDN Lustre |
7900 TB |
For decades, NOAA weather research has relied on High Performance Computing to further its mission of developing leading edge weather observation and prediction capabilities. This has been accomplished both the development of leading edge software as well as the adoption of cutting edge hardware technologies to push forward the envelope of what is computationally feasible.
Intel Paragon was an early parallel system, delivered in 1991 and was used for the development of a parallel RUC model. Researchers at GSL also developed the Scalable Modeling System (SMS) to assist in the parallelization of codes. To further development of parallel programming standards GSL staff members participated in the development of the MPI-1 and MPI-2 standards, which provided a common basis for the parallel computational methods used today.
In 2000, GSL took delivery of an HPC system relying on a relatively new concept, clustering. Very similar to a Beowulf cluster, the system used off the shelf desktop servers with Myrinet a high-speed, low-latency network interconnect. This system provided substantially more performance that the traditional architectures available at the time in a much more cost effective manner.
Now Jet refers to any of the clustered systems that have passed through GSL since 2000 and are used to support NOAA Research and Development High Performance Computing (RDHPC) requirements for GSL and other NOAA offices, including the Hurricane Forecast Improvement Project (HFIP) since 2009.
Jet Partitions
The following Jet partitions and Jet Billable TRes Factors are defined:
Partition |
QOS Allowed |
Billable TRes per Core Performance Factor |
Description |
---|---|---|---|
sjet |
batch,windfall, debug, urgent, novel |
145 |
General compute resource - Intel Sandybridge |
vjet |
batch,windfall, debug, urgent, novel |
165 |
General compute resource - Intel IvyBridge |
xjet |
batch,windfall, debug, urgent, novel |
150 |
General compute resource - Intel Haswell |
kjet |
batch,windfall, debug, urgent, novel |
165 |
General compute resource - Intel Skylake |
bigmem |
batch,windfall, debug, urgent |
150 |
Large memory jobs; 4 nodes, each with 24 cores and 256 GB of memory - Intel Haswell |
novel |
novel |
165 |
Partition for running novel or experimental jobs where nearly the full system is required. If you need to use the novel QOS, please sumbit a ticket to the help system and tell us what you want to do. We will normally have to arrange for some time for the job to go through, and we would like to plan the process with you. Please note that if you use novel partition you also need to specify novel QoS. |
service |
batch,windfall, debug, urgent |
0 |
Serial jobs (max 1 core), with a 24 hr limit. Jobs will be run on front end (login) nodes that have external network connectivity. Useful for data transfers or access to external resources like databases. If you have a workflow that requires pushing or pulling data to/from the HSMS(HPSS), this is where they should be run. See the Login (Front End) Node Usage Policy for important information about using Login nodes. |
To see a list of the available partitions use the command:
$ sinfo -O partition
sjet
vjet
xjet
kJet
bigmem
service
Selecting General compute resources on Jet: Unless you have a real-time reservation (see below), and to assure the all partitions are used most efficiently, we recommend that you specify the use of the default, all general compute resource partitions. This option gives the batch scheduler the flexibility to put your job on the first available resource. To do this, you must choose compilation options that create executables that can be used on any partition, which is covered in the Recommended Intel Compiler Options for Optimization section in the Jet User Guide.
On Jet the processor architecture, cores per node and memory per core varies for each partition so your execution time may vary slightly; therefore it is important to understand the architectural differences, so you understand how your code will run and perform on various partitions.
To specify all Jet General Compute Resource Partitions (the default), so your job will run on the first available partition, do not specify a partition.
GPU Clusters
GSL continues to research potentially disruptive, next generation HPC technologies. Graphical Processing Units, GPUs, are traditionally used for graphics and video gaming, but their design is applicable to numerical modelling as well. Since their architecture is fundamentally different from traditional CPUs, existing software usually does not run without modification.
At GSL, we have been using GPU clusters since 2009 and are developing new tools and techniques that will allow these systems to be used in the future by scientists to solve tomorrow’s weather and hurricane prediction challenges.
About Modules
Modules is a tool that is used to manage the use of softwarewhen multiple versions are installed. For packages that arenot provided with the OS (compilers, debuggers, MPI stacks,etc), we install so that new versions to not overwrite oldversions. By default, no modules are loaded. Therefore you must loadany modules that you wish to use. To see what modules areavailable, run:
# module avail
At a minimum you will want to load a compiler and an MPIstack:
$ module load intel
$ module load mvapich2
Note
Since you have to do this explicitly (for now), you also have to do it in your job scripts. Or, you can put it in your .profile and make it permanent.
Modules on Jet
The way to find the latest modules on Jet is to run module avail:
$ module aval
to see the list of available modules for the compiler and the MPI modules currently loaded.
--------------------------------- /apps/lmod/lmod/modulefiles/Core ---------------------------------
lmod/7.7.18 settarg/7.7.18
------------------------------------ /apps/modules/modulefiles -------------------------------------
advisor/2019 g2clib/1.4.0 intel/19.0.4.243 rocoto/1.3.1
antlr/2.7.7 gempak/7.4.2 intelpython/3.6.8 szip/2.1
antlr/4.2 (D) grads/2.0.2 matlab/R2017b udunits/2.1.24
cairo/1.14.2 hpss/hpss nag-fortran/6.2 vtune/2019
cnvgrib/1.4.0 idl/8.7 nccmp/1.8.2 wgrib/1.8.1.0b
contrib imagemagick/7.0.8-53 ncview/2.1.3 xxdiff/3.2.Z1
ferret/6.93 inspector/2019 performance-reports/19.1.1
forge/19.1intel/18.0.5.274 (D) pgi/19.4
Where:
D: Default Module
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
In the above, each module name represents a different package. In cases where there are multiple versions of a package, one will be set as a default. For example, for the intel compiler there are multiple choices:
intel/11.1.080 intel/12-12.1.4(default) intel/12-12.1.5
So if you run:
# module load intel
The default version will be loaded, in this case 12-12.1.4 If you want to load a specific version, you can. We highly recommend you use the system defaults unless something is not working or you need a different feature. To load a specific version, specify the version number.
# module load intel/11.1.080 # module list Currently Loaded Modulefiles: 1) intel/11.1.080
to a different version of the same module, you can either do If you already have a particular module loaded and you want to switch
# module unload intel # module load intel/11.1.080
or
# module switch intel intel/11.1.080
Warning
When unloading modules, only unload those that you have loaded. The others are done automatically from master modules.
Note
Modules is a work in progress, and we will be improving their uses and making which modules you load more clear.
Using Math Libraries
The intel math kernel library (MKL) provides a wide variety of optimized math libraries including “BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, vector math, and more.” Click the link for product documentation
Below are provided several examples that should help most of the users on our system.
Location of MKL on Jet
MKL is specific to the version of the Intel compiler used. After loading the compiler version you require, the variable $MKLROOT will be defined that specifies the path to the MKL library. Use this variable.
Basic Linking with BLAS and LAPACK
To link with the mathematical libraries such as BLAS, LAPACK, and the FFT routines, it is best to just add the following option to your link line:
-mkl=sequential
Note
There is no lower case L in front of mkl. This will include all of the libraries you will need. The sequential option is important because by default Intel MKL will use threaded (OpenMP like) versions of the library. In MPI applications you rarely want to do this. Even if you are using OpenMP/MPI hybrids, only consider removing the sequential option if you want the actual math routines to be parallel, not the whole code (Ex: GFS uses OpenMP, but relies on sequential math routines, so you would want to use sequential for that code).
Linking with FFT, and the FFTW interface
Intel provides highly optimized FFT routines within MKL. They are documented in the Intel math kernel library. While Intel has a specific interface (DFTI), we recommend that you use the FFTW interface. FFTW is an open-source, highly optimized FFT library, that supports many different platforms. FFTW (specifically FFTW3 interface) can be supported on Intel, AMD, and IBM Power architectures. IBM is even supporting the FFTW interface through ESSL, meaning that using the FFTW3 interface will allow codes to be portable across the NOAA architectures.
The best reference for the fftw interface can be found here. For Fortran, you need to include the wrapper script fftw3.f in your source before using the functions. Add the following statement:
include 'fftw3.f'
In the appropriate place in your source code. When compiling, add:
'-I$(MKLROOT)/include/fftw'
to your CFLAGS and/or FFLAGS. When linking, use the steps described above.
Linking with Scalapack
Linking with Scalapack is more complicated because it uses MPI. You have to specify which version of the MPI library you are using when linking with Scalapack. Examples are:
Linking with Scalapack and mvapich
LDFLAGS=-L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
Linking with Scalapack and OpenMPI
LDFLAGS=-L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
In the example above, the variable $(MKLROOT) is used. Use this variable name, not the explicit path for the Intel compiler.
Linking math libraries with Portland Group
For the PGI compiler, all you need to do is specify the library name.
For blas:
-lblas
For lapack:
-llapack
Options for Editing on Jet
To use any of these editors, type the name in at the command line:
vi |
http://www.linuxlookup.com/howto/using_vi_text_editor - The old school standard editor. It is a text based editor (although X window versions do exist). |
emacs |
https://www.gnu.org/software/emacs/ - An extensible, customizable free/libre text editor |
nedit |
http://www.nedit.org/ - An editor most like what you would find in Windows. |
nano |
It is just like nedit, easier to learn than vi, and does not require X11. |
vimdiff |
extremely useful for visualizing the difference between source code files. It opens many files vi windows side-by-side and highlights any differences between the files. The user can edit the differences directly. Super useful for code development. |
gvimdiff |
X11 version of vimdiff with mouse support. |
Starting a Parallel Application
Supported MPI Stacks
We currently support two MPI stacks on Jet, Mvapich2 and OpenMPI. We consider Mvapich2 our primary MPI stack. OpenMPI is provided for software development and regression testing. In our experience, Mvapich2 provides better performance without requiring tuning. We do not have the depth of staff to fully support multiple stacks, but we will try our best. If you feel you need to use OpenMPI as your production stack, please send us a note through help Requests and explain why so we can better understand your requirements.
Load MPI Stacks Via Modules
The MPI libraries are compiler specific. Therefore a compiler must be loaded first before the MPI stacks become visible.
$ module load intel
$ module avail
...
------------------------- /apps/Modules/default/modulefamilies/intel -- -------------------
hdf4/4.2.7(default) mvapich2/1.6 netcdf/3.6.3(default) netcdf4/4.2.1.1(default)
hdf5/1.8.9(default) mvapich2/1.8(default) netcdf4/4.2 openmpi/1.6.3(default)
You can see now that mvapich2 and openmpi available to be loaded. You can load the module with command:
# module load mvapich2
Warning
Please use the default version of the MPI stack you require unless you are tracking down bugs or by request of the Jet Admin staff.
Launching Jobs
On Jet, please use mpiexec. This is a wrapper script that sets up your run environment to match your batch job and use process affinity (which provides better performance).
mpiexec -np $NUM_OF_RANKS
Launching MPMD jobs
MPMD (multi-program, multi-data) programs are typically used for coupled MPI jobs, for example oceans and atmosphere. Colons are used to separate the requirements of each launch. For example:
mpiexec -np 36 ./ocean.exe : -np 24 ./atm.exe
Of the 60 MPI ranks, the first 36 will be ocean.exe process, and the last 24 will be the atm.exe process.
MPI Library Specific Options
The MPI standard does not explicitly define how implementations are done between the libraries. Therefore, a single call to mpiexec can never be guaranteed to work across different libraries. Below are the important differences between the the ones that we support.
Passing Environment Variables
There are two methods to pass variables to MPI processes, global (-genv) and local (-env). The global ones are applied to every executable. The local ones are only applied to the executable specified. The two methods are the same if the job launch is not MPMD. If you need to pass different variables with different values to different MPMD executables, use the local version. When using the global versions you should put them before the -np specification as that defines where the local parameters start.
To pass a variable with its value:
-genv VARNAME=VAL
To pass multiple variables with values, list them all out:
-genv VARNAME1=VAL1 -genv VARNAME2=VAL2
If the variables are already defined, then you can just pass the list on the mpiexec line:
-genvlist VARNAME1,VARNAME2
If you want to just pass the entire environment, you can just do:
-genvall
- ..note ::
This may have unintended consequences and may not work depending how large your environment is. We recommend you explicitly pass what you need to pass to the MPI processes.
If you need to pass different variables to different processes in an MPMD configuration, an example of the syntax would be:
mpiexec -np 4 -env OMP_NUM_THREADS=2 ./ocean.exe | -np 8 -env OMP_NUM_THREADS=3 ./atm.exe
OpenMPI Specific Options
Passing Environment Variables
The option -x is used to pass variables. To pass a variable with its value:
-x VARNAME=VAL
To pass the contents of an existing variable:
-x VARNAME
To pass multiple variables:
-x VARNAME1,VARNAME2=VAL2,VARNAME3
When comparing this to Mvapich2, these are all local definitions. There is no way to pass a variable to all processes of an MPMD application with a single usage of -x.
Policies and Best Practices
Project Data Management
Project Data Management This includes the High Performance File System (HPFS, Scratch), HFS (Home File System), the HPSS HSMS (tape).
Login (Front End) Node Usage Policy
Login (Front_End) Node Usage Policy in RDHPCS CommonDocs
Cron Usage Policy
Cron Usage Policy in RDHPCS CommonDocs
Maximum Job Length Policy
See the section for maximum job length per partition and QOS. If you require jobs to run longer than this, it is expected that you use checkpoint/restart to save the state of your model. Then you can resubmit the job and have it pickup where it left off. This policy has been developed over a decade of different job patterns as a balance between user needs, fairness within the system, and reducing risk of losing too many CPU hours from failed jobs or system interruptions.
/tmp Usage Policy
Every node in the Jet system has a /tmp directory. In most other Unix/Linux systems, users use this space used for temporary files. This generally works when the size of /tmp is somewhat similar to the working space (like /home) on a traditional workstation.
However, Jet is not a workstation. The size of /tmp on Jet is much smaller than the working space of the project directories. In many cases, a typical file written in a project directory could be as large as the entire /tmp space. On the compute nodes, the problem is worse. The compute nodes have no disk, and the size of /tmp is on the order of 1 GB.
For these reasons:
Users should refrain from using /tmp. The /tmp directory is for system tools and processes.
All users have project space, use that space for manipulating temporary files.
The /tmp filesystem can be faster for accessing small files there are valid reasons to use /tmp for your processing. Only consider using /tmp if:
The size of your files are less than a few MB
Your files will not be need after the process is done running
Please clean up your temporary files after you are done using them.
Software Support Policy
Our goal is to enable science on any RDHPCS system. This often includes installing additional software to improve the utility and usefulness of the system.
Systems Administrator Managed Software
The HPCS support staff is not an unlimited resource and since every additional software package installed increases our effort level, we have to evaluate each request. The systems administrators will take on the responsibility of maintaining packages based on the usefulness of the tool to the user community, their complexity of installation and maintenance, as well as other factors.
If the package is a part of the current OS base (Redhat), these requests will normally be honored
One notable exception is for 32-bit applications. 32-bit support requires a huge increase of installed packages which makes they system images harder to maintain and secure. We expect all applications to work in 64-bit mode.
If the package is available from the EPEL repository, it is likely that we can install it unless it causes additional complexities. However, if EPEL stops supporting it, we may as well.
If the software is not a part of the Redhat or EPEL repositories, we can still consider it. Each request will be considered on a case by case basis based on the value to the community.
Single-user Managed Software
Users are always free to install software packages and maintain them in their home or project directories.
“Contributor” Managed Software
We have one other method to support software on the system. As we cannot be the experts of all system packages, we have to rely on the community to help out to provide as much value from the system as possible. To enable this, we have a user contributed software section. The user will be given access to a system level directory in which they can install software. We will make the minimal changes necessary to allow access to the installed tool. Any questions from the help system that we cannot answer will be forwarded to the package maintainer.
If you wish to contribute a package to the system, please start a system help ticket: Getting Help
System Software
How Software is Organized Through Modules
Many software packages have compiler dependencies, and some also have MPI stack dependencies. To ensure that the correct packages are loaded, the module installation has been designed so that only valid packages are presented to you. For example, there are multiple versions of netcdf3, one for each compiler family we have. So when you run module avail:
# module avail
------------------------------ /apps/Modules/3.2.9/modulefil------------------------------------------------
bbcp/12.01.30.01.0(default) hpssmodule-cvs nulludunits/1.12.11
cnvgrib/1.2.3(default) intel/11.1.080 module-info pgi/12.5-0(default) udunits/2.1.24(default)
cuda/4.2.9(default) intel/12.1.4(default) modules rocoto/1.0.1(default) use.own
dot intel/12.1.5 ncl/6.0.0 szip/2.1 wgrib/1.8.1.0b(default)
grads/2.0.1(default) lahey/8.10b(default) nco/4.1.0 totalview/8.9.2-2(default) wgrib2/0.1.9.6a(default)
There is no option for netcdf3. However, after load a compiler, then you have access to the packages that are dependent on that compiler.
# module load mvapich
# module avail
---------------------------- /apps/Modules/default/modulefamilies/intel -------------------------------------------
hdf4/4.2.7(default) hdf5/1.8.9(default) mvapich2/1.6 mvapich2/1.8(default) netcdf/3.6.3(default) netcdf4/4.2 openmpi/1.6
The same method exists for packages that are dependent on both a compiler and MPI stack. If you wanted to use parallel hdf5 or parallel netcdf4, you would have to first specify the MPI stack you wanted to use.
[ctierney@fe8 ~]$ module avail
-------------------------------------- /apps/Modules/default/modulefamilies/intel-mvapich2/1.8 ----------------------
hdf5parallel/1.8.9(default) netcdf4-hdf5parallel/4.2(default)
Environment Variables
For all packages on the system, environment variables have been set to ensure consistency in their use. We have defined the following variables for your use when using the different packages on the system:
$NETCDF - Base directory of NetCDF3
$NETCDF4 - Base directory of NetCDF4
$NCO - Base directory of NCO
$HDF4 - Base directory of HDF4
$HDF5 - Base directory of HDF5
$UDUNITS - Base directory of Udunits
$SZIP - Base directory of szip
$NCARG_ROOT - Base directory of NCAR Graphics and NCL
$GEMPAK - Base directory of GEMPAK
$GEMLIB - Location of GEMPAK libraries
$CUDA - Base directory of Cuda
$GADDIR - Location of Grads libraries
When you are specifying the location of the libraries when compiling, use the variable name. For example:
icc mycode.c -o mycode -I$NETCDF/include -L$NETCDF/lib -lnetcdf
User supported modules
Users who require access to packages not currently supported by the HPC staff are welcome to submit requests through the help system to install and support unique modules. To access these user supported modules you must first update the module path to include the /contrib/modulefiles. To access these additional modules execute the following commands.
$ module use /contrib/modulefiles
$ module avail
. . .
----------------------------- /contrib/modulefiles -----------------------------
anaconda/2.0.1 papi/5.3.2(default)
ferret/v6.9(default) sbt/0.13.7(default)
gptl/5.3.2-mpi scala/2.11.5(default)
gptl/5.3.2-mpi-papi(default) tau/2.22-p1-intel(default)
gptl/5.3.2-nompi tau/2.23-intel
papi/4.4.0 tau/2.23.1-intel
papi/5.0.1 test/1.0
papi/5.3.0 tm/1.1
Using OpenMP and Hybrid OpenMP/MPI on Jet
Using OpenMP and Hybrid OpenMP/MPI on Jet
OpenMP is a programming extension for supporting parallel computing in Fortran and C using shared memory. It is relative easy to parallelize code using OpenMP. However, parallelization is restricted to a single node. As any programming model, there can be tricks to make to write efficient code.
We support OpenMP on Jet, however, it is infrequently used and we have not figured out all the issues. If you want to use OpenMP, please submit a help request and let us know so we can keep track of the users interested in using it.
Compiling codes with OpenMP
For Intel, add the option ‘’’-openmp’’’. For Portland Group, add the option ‘’’-mp’’’
Specifying the Number of Threads to use
Depending on the compiler used, the the default number of threads to use is different. Intel will use all the core available. For PGI, it will default to using 1. It is best to always explicitly set what you want. Use the OMP_NUM_THREADS variable to do this. Ex:
setenv OMP_NUM_THREADS 4
The number you want to use would generally be the total available on a node. See the [[system_information|System Information]] page for how many cores there are on each system.
Programming Tips for OpenMP ==
Do not use implicit array setting when initializing arrays in Fortran. Since memory is not allocated until it is first used, there is no way for the implicit statement to understand what to do. What this will lead to is that your program won’t understand memory locality and cannot allocate memory in the ‘closest’ memory. This will lead to performance and scalability issues.
So, don’t do this:
A=0.
Do this:
!$OMP PARALLEL DO SHARED(A)
for j=1,n
for i=1,m
A(i,j)=0.
enddo
enddo
This is not a Jet issue, but affects all architectures. By structuring your code in the fashion above then your code will be more portable.
Using MPI calls from OpenMP critical sections
When using MPI and OpenMP, it is not necessary to worry about how threading is managed in MPI unless the MPI calls are from within OpenMP sections. You must disable processor affinity for this to work. To do this, you must pass the variable MV2_ENABLE_AFFINITY=0 to your application at run time. For example:
mpiexec -v MV2_ENABLE_AFFINITY=0 ......
See the mvapich2 documentation for more information.