RDHPCS Maintenance Calendar
Accounts and Projects
Applying for a user account
Request access to RDHPCS projects
Requesting access to MSU-HPC systems
General Access Requirements
Account Activity Requirements: Suspension, Deactivation, Reactivation
Deactivated Accounts
Request a New Allocation or Project
Common Access Card (CAC)
RSA Tokens
Requesting a new or replacement RSA token
New users: submit the RSA token request form
Current users: Replacement token for a new smart phone
RSA Hardware Token Instructions
RSA Software Token Instructions
Step 1: Install the application
Step 2: Click the activation URL
Alternate Step 2: Manually enter the activation URL
RSA Activation
Step 3: RSA Activation and PIN setting
Step 4: Success! Mark helpdesk ticket as completed
Role Accounts
Requesting a new role account
Requesting changes to a role account
Accessing a role account
X Applications With role accounts
Using CRON or SCRON
Connecting
Connecting for the first time
Secure Shell (SSH) Access
Bastion Hostnames
Common Access Card (CAC) SSH Login
RSA SSH Login
Selecting a Node
X11 Graphics
SSH Port Tunnels
Web based ParallelWorks Access
Getting Help
Submitting a Good Help Request
Use a Good Subject
Provide Detailed Description of the Problem
Provide Job Information
Describe How to Reproduce the Problem
Only Report One Problem Per Help Ticket
Follow up With Additional Information or Questions
Required Information for Specific Types of Help
Basic Ticket Information
File System Problems
Compilation Problems
Job Submission Problems
Job Completion Problems
Providing a Reproducer
Reporting Data Transfer Issues
Managing Help Tickets
Help Ticket System User Portal
Login
Reply to a Ticket
Search for a Ticket
Create a New Ticket
Systems
General Information
Locations and Systems of the RDHPCS
Bastion Hostnames
Gaea User Guide
System Overview
Node types
Compute nodes
Login nodes
Data transfer nodes
System interconnect
File systems
Operating system
Connecting
Data and storage
NFS file systems
GPFS file systems
Move data to and from Gaea
Programming environment
Environment Modules
Compilers
MPI
Compiling
Compilers
Running jobs
Login vs Compute Nodes
Slurm
Software
Debugging
Linaro DDT
GDB
GDB4HPC
Valgrind4hpc
Profiling
HPE Performance Analysis Tools
Tips and tricks
GPFS (F5) Performance
Known issues
Open issues
RDHPCS Cloud Computing
Parallel Works User Guide
NOAA’s Parallel Works Portal
Workflow
Using Parallel Works
Before you Begin
Using ACTIVATE
Storage Types and Storage Costs
Cloud Project Management: Create a Cloud Project
Using Parallel Works with on-premise HPC Systems
Authentication Issues
Getting Help
Usage Reports
Cloud Presentations
Frequently Asked Questions
General Issues
Jet User Guide
Jet Partitions
GPU Clusters
About Modules
Using Math Libraries
Options for Editing on Jet
Starting a Parallel Application
Policies and Best Practices
System Software
Using OpenMP and Hybrid OpenMP/MPI on Jet
Hera User Guide
About NESCC
System Overview
System Configuration
Hera Partitions
Lustre File System Usage
Lustre Volume and File Count
Lustre
Hera Lustre Configuration
File Operations
Types of file I/O
File Striping
Userspace Commands
Applications and Libraries
Using Anaconda Python on Hera
MATLAB
Using IDL on Hera
Using ImageMagick on Hera
Using R on Hera
Libraries
Using Modules
Using MPI
Loading the MPI module
Using PGI and mvapich2
Tuning MPI (TBD)
Profiling an MPI application with Intel MPI
Debugging Codes
Debugging Intel MPI Applications
Application Debuggers
Invoking DDT on Hera with Intel IMPI
Profiling Codes
Linaro Forge
TAU
Managing Contrib Projects
Fine Grain Architecture (FGA) System
System Information
Getting an allocation for FGA resources
Using FGA resources without an allocation
User Environment
Compiling and Running Codes on the FGA
Compiling and Running Codes Using CUDA
Compiling and Running Codes Using Intel MPI
Compiling and Building Codes Using mvapich2-gdr Library
Compiling and Building Codes Using OpenMPI
Compiling codes with OpenACC directives on Hera
Compiling MPI codes with OpenACC directives on Hera
Submitting Batch Jobs to the FGA System
Hints on Rank Placement/Performance Tuning
Rank placement when using mvapich2
Using Nvidia Multi-Process Service
Compiling and Building Codes With The Cray Programming Environment
Some helpful web resources
Getting Help
Niagara User Guide
System Overview
Data Transfer
Per User Data Management on Niagara
Lustre File System Usage
Components
Configuration
File Operations
MSU-HPC User Guide
Introduction
MSU’s Official HPC Documentation
General Information
Logging In
Running Jobs on MSU-HPC Systems
Submitting a Job
Specifying a Partition
Monitoring Jobs
Getting Information about your Projects
MSU-HPC System Configuration
File Systems
Orion Compute System
Hercules Compute System
Account Management
Overview
MSU Account Management Policies
Managing Project and Role Account Members
NOAA Portfolio, Project, and User Management on MSU-HPC
Getting An Account
Account Renewal
Managing Portfolios, Projects and Allocation
Role Accounts
Help, Policies, Best Practices, Issues
MSU-HPC Help Requests
Policies and Best Practices
Protecting Restricted Data
MSU FAQ
HSMS HPSS User Guide
NESCC HPSS
Gaining Access to use HPSS
New HPSS User Requests
Adding New Projects to HPSS
NESCC HPSS Data Structure
Data Retention
Expired Data Deletion Process
PPAN User Guide
Login to Analysis
ssh setup for GFDL Workstations
C-shell Setup
C-shell keybinds for analysis
xterm and gnome-terminal
Logging into the PP nodes
File Systems
/home and /nbhome
/archive
/vftmp & $TMPDIR
/work
/ptmp
/net, /net2, /net3
Batch Software
Access
Local Commands
Gotchas
Analysis Software
Using Modules
netcdf Library
MATLAB Licenses
Report Option [-r]
Show Archive Report By Specified Group [view]
Show Archive Report By Specified User [view]
Show Archive Report By Specified Group and Sort By Files [view] [sort]
Show Archive Report By Specified Group and Sort By Bytes [view] [sort]
Show Archive Report By Specified Date [date]
Summary Option [-s]
Show Archive Summary
Show Archive Summary and Sort By Files [sort]
Show Archive Summary and Sort By Bytes [sort]
Show Archive Summary By Date [date]
Group Quotas
User Quotas
Info
Configuration
Enforcing Quotas
Data Storage and Transfers
Summary of Storage Areas
Notes on User-Centric Data Storage
User Home Directories (NFS)
User Archive Directories (PAN Only)
Notes on Project-Centric Data Storage
Project Home Directories (NFS)
Project Work Areas
Project Archive Directories
NESCC HPSS
Gaining Access to use HPSS
New HPSS User Requests
Adding New Projects to HPSS
NESCC HPSS Data Structure
Portfolios Using HPSS
Data Retention
Expired Data Deletion Process
File Size Guidelines
Data Recovery Policy
Getting Started
HTAR
HSI
File Expiration Commands
Sample HPSS Batch Job
HPSS Help
GFDL Archive
Gaining Access to use the GFDL Archive
GFDL Archive Data Structure
Data Retention
Data Recovery Policy
Getting Started
Allocation and Quota
Finding Files
GFDL Archive Help
Data Transfer Overview
Data Transfer Methods
Globus
Data Transfer Nodes (DTNs)
Untrusted Data Transfer Nodes (UDTNs)
Port Tunnelling
Requests for Firewall Exceptions
Firewall Exception Terms
Transferring Data
Globus Connect
Trusted Data Transfer Nodes (DTN)
Untrusted Data Transfer Nodes (UDTN)
Transfer and Syntax Examples
Transfer a file on Hera to a destination on Jet
Globus transfer from an external endpoint to the GFDL untrusted endpoint
Firewall Modification Requests for DTNs
Example
Unattended Data Transfers or Password-less Transfers to/from RDHPCs Systems
Using a Pre-Established SSH Port Tunnel
SSH Port Tunnel from Linux-like systems
SSH Port Tunnel For PuTTy Windows Systems
SSH Port Tunnel For Tectia Windows Systems
WinSCP
Example
Globus Online Data Transfer
Overview
Example
RDHPCS Globus Collection Summary
NOAA RDHPCS Globus Endpoint Types
NOAA RDHPCS UDTN’s (Globus Untrusted Endpoint)
NOAA RDHPCS Object Stores in the Cloud
Globus Command Line Interface (CLI)
Transferring Data to and from Your Computer
Globus Example
What you need to have on hand
What you need to do
Using Globus Online Data Transfer
RDHPCS Object Stores in the Cloud
External S3 Bucket Connectors
NOAA RDHPCS Globus Endpoint Types
NOAA RDHPCS UDTN’s (Globus Untrusted Endpoint)
NOAA RDHPCS Object Stores in the Cloud
Accessing Cloud Endpoints in our environment
Publicly accessible buckets, no keys required
Non-public, secret keys required
Globus Command Line Interface (CLI)
Transferring Data to and from Your Computer
GFDL Data Services
GFDL Data Digital Object Identifier (DOI) Policy
Migrating Data Between Local File Systems
General Guidelines
Suggested Tools
du
tree
rsync
xsync
A sample batch script to transfer data
Known Issues
My job runs to completion but the files are not transferred
Were all my files transferred?
Policies and Best Practices
System Usage
Login Node Usage
Cron and scrontab usage
Cron/Scrontab Dos
Cron/Scrontab Don’ts
Allocations
Request an Increase in Allocations
Adding a Project to an Allocation
Cloud Computing Allocations
Quotas
Requesting Additional Storage for a Project
File System Usage Practices and Policies
High Performance File System (HPFS - Scratch)
General Parallel File System (GPFS)
/data_untrusted
HFS
Filesystem Backup and Data Retention
Recover Recently Deleted Files from /home
HPSS (Data Retention)
Expired Data Deletion Process
Data Recovery Policy
Data Disposition
HPFS (Scratch) Data
Niagara Per User Data
Home File System (HFS) Data
Protecting Restricted Data
Managing Packages in
/contrib
Overview of
contrib
Packages
Responsibilities of a
contrib
Package Maintainer
contrib
Packages Guidelines
contrib
Package Maintainer Requests
Managing a
contrib
Package
Maintaining “Metadata” for
contrib
Packages
contrib
Package Directory Naming Conventions
Queue Policy
Overview
Specifying a Quality of Service (QOS)
Changing QOS’s
Jet and Hera
Gaea
General Recommendations
Priorities Between QOS
Debug & Batch QOS
Software
Modules
View Active Modules
Find Modules
Load Modules
Adding Additional Module Paths
Modules with sh, bash, and ksh scripts
Why doesn’t the module command work in shell scripts?
Command Summary
Python on RDHPCS Systems
Overview
Python Guides
Conda Basics
Installing Miniconda
Jupyter on RDHPCS Systems
Module Usage
Base Environment
Custom Environments
How to Run
RDHPCS Compute Nodes
Best Practices
Additional Resources
Workflow Software
Cron and Slurm crontab
Cron
Viewing currently running crontab processes
Slurm Crontab
Flexible Modeling System Runtime Environment
Rocoto
Rocoto on RDHPCS Systems
Rocoto Documentation
Rocoto Help
Rocoto Best Practices
Debugging with Forge DDT
DDT remote connection
Debgging an MPI process
First time configuration
Submit a debug job
Compilers
Using the Intel compiler with Intel MPI
Recommended Intel Compiler Options for Optimization
Compiling for multiple processor types
Compiling for reproducible results
Recommended Intel compiler options for debugging
Thread-safe compilation
Other potentially useful Intel compiler options
Using the Nvidia/PGI compilers
Documentation on Nvidia/PGI compiler options
Nvidia/PGI compiler options for optimization
Nvidia/PGI compiler options for debugging
Containers
Introduction
Background
Supported RDHPCS Container Solutions
Limitation, Exception and Liability
Singularity
How to create images
Run a Single Node or Single Core Containers
Run an MPI-dependent container
X2Go Remote Desktop
Requirements
Configure X2Go
Launch X2Go Session
X2Go Tips
Troubleshooting X2Go
X2Go Help Desk Requests
Tectia
Tectia Initial Setup procedure
Install and Configuring Tectia
Install the Tectia Client
Configure the Tectia Client
Port Tunnelling
Set Up Port Tunnelling
Testing Port Tunnels
Slurm
Overview
Running a Job
Batch Scripts
Interactive Jobs
Common
sbatch
Options
Slurm Environment Variables
State Codes
Job Reason Codes
Job Dependencies
Srun
Monitoring Jobs
Show Pending and Running Jobs
Show Completed Jobs
Getting Details About a Job
Priority and Fairshare
Understanding Slurm Fairshare
Fairshare Priority Factor
Fairshare Definitions
Fairshare Reporting
Priority Reporting
Getting Information About Your Projects
sfairshare
saccount_params
Generating Reports
Sreport
Shpcrpt
References
Gaea Batch Job Overview
Compiling
Running
Staging/Combining
Transferring Data to/from Gaea
Allocation
Running a Simple Job
Running the Script
Once the job is submitted
Once the job is Finished
FAQ
Frequently Asked Questions
Accounts
How Do I Get an RDHPCS Account?
PW login is getting a “Invalid username or password” error.
My RSA Token is locked.
I forgot my passphrase, how do I reset it?
How do I use X11 appplication with shared user account (role account)?
Jobs
My job hasn’t started and I have been waiting a long time. What is wrong?
My job hasn’t started and it is in a reservation, what is wrong?
All my multi-node MPI jobs are timing out, even simple jobs! What is wrong?
My multi-node jobs fail on mpirun/mpiexec.
What is the meaning of the exit code?
User
How do I change my default login shell?
How can I recover recently deleted files from /home?
Why am I not able to ssh between nodes, it is asking me for a password!
How can I recover files that I accidentally deleted from my project space?
How to transfer small files to/from an RDHPCS system?
I can no longer transfer files via the port tunnel, please help!
Python
Can you please install the xyz python package(s)?
Why are my jobs failing intermittently?
Why am I getting these errors? I am using hpc-stack for NCEPLIBS
I am using spack-stack and getting some errors
When is my .bashrc executed? When would it be ignored?
Where can I find “Operational Data” from WCOSS2 on Hera?
My jobs using NCL are no longer working
Compile WRF on Hera/Jet with Rocky OS
How do I enable x11 forwarding using PowerShell on a Windows system?
Recent User-Facing Changes
Apr 29, 2024: The new LFS5 filesystem on Jet
Apr 25, 2024: Rocoto update to version rocoto/1.3.7 on Hera/Jet/Niagara
Apr 9, 2024: The aging uJet and tJet clusters are being turned off
Apr 2, 2024: Migration to Rocky8 in phases (Complete)
Mar 19, 2024: Migration to Rocky8 in phases
Current migration status
Feb 20, 2024: Migration to Rocky8 in phases
Jan 17, 2024: Rocoto updated to version 1.3.6
RDHPCS Office Hours
20 September 2024
6 September 2024
20 June 2024
4 June 2024
10 May 2024
29 March 2024
15 March 2024
1 March 2024
New User Office Hour 28 Feb 2024
4 Jan 2024
15 December 2023
30 Nov 2023
Glossary
Contributing to these docs
Overview
Process
Submitting suggestions
Contributing changes
GitHub Guidelines
Workflow for contributions to the documentation repository
Resources
Contributing via the CLI
Setup authoring environment
Edit the docs
Resources
NOAA RDHPCS User Documentation
Software
Workflow Software
Edit on GitHub
Open an Issue on GitHub
Workflow Software
A few scientific workflows are available on the RDHPCS systems.
Cron and Slurm crontab
Cron
Viewing currently running crontab processes
Slurm Crontab
Flexible Modeling System Runtime Environment
Rocoto
Rocoto on RDHPCS Systems
Rocoto Documentation
Rocoto Help
Rocoto Best Practices