Need Help? Click Here
RDHPCS Maintenance Calendar
Accounts and Projects
Applying for a user account
Request access to RDHPCS projects
Requesting access to MSU-HPC systems
General Access Requirements
Account Activity Requirements: Suspension, Deactivation, Reactivation
Deactivated Accounts
Request a New Allocation or Project
RSA Tokens
Common Access Card (CAC)
RSA Software Token User Instructions
Step 1: Submit the RSA token request form and receive response email
Step 2: Install the application
Step 3: Configure the application
Step 3a: Copy the activation URL
Step 3b: Enter the activation URL
Step 3c: Enter the Activation Code
Step 4: Set the RSA PIN
Step 5: Success!
Role Accounts
Accessing a Role Account
X Applications With Role Accounts
Using CRON
Connecting
Connecting for the first time
Secure Shell (SSH) Access
Bastion Hostnames
Common Access Card (CAC) SSH Login
RSA SSH Login
Selecting a Node
X11 Graphics
SSH Port Tunnels
Web based ParallelWorks Access
Getting Help
Submitting a Good Help Request
Use a Good Subject
Provide Detailed Description of the Problem
Provide Job Information
Describe How to Reproduce the Problem
Only Report One Problem Per Help Ticket
Follow up With Additional Information or Questions
Required Information for Specific Types of Help
Basic Ticket Information
File System Problems
Compilation Problems
Job Submission Problems
Job Completion Problems
Providing a Reproducer
Reporting Data Transfer Issues
Managing Help Tickets
Help Ticket System User Portal
Login
Reply to a Ticket
Search for a Ticket
Create a New Ticket
Systems
General Information
Locations and Systems of the RDHPCS
Bastion Hostnames
Gaea User Guide
System Overview
GAEA Quickstart
Compiling
Running
Staging/Combining
Transferring Data to/from Gaea
Allocation
Running a Simple Job Script
System Architechture
Node Types
Clusters
What is C5?
Job Submission
Queues
Job Monitoring
Terminology
Environment
Do’s and Don’ts
File Systems
Summary of Storage Areas
HOME
Allocations and Quotas
Modules
LMOD
LMOD Search Commands
Adding Additional Module Paths
Module Commands
Compilers
Available Compilers
Compilers on C5
Cray Compiler Wrappers
Compiling and Node Types
Controlling the Programming Environment
Compiling Threaded Codes
Hardware
c5 partition
es partition
Queue Policy
Debug & Batch Queues
Priority Queues
Queues per Partition
Scheduler/Priority Specifics
Slurm Queueing System
Useful Commands
Running your models
Monitoring your jobs: Shell Setup
Fair Share Reporting
Data Transfers
Available Tools
f5 <-> f5
Gaea <-> GFDL
Gaea <-> Remote NOAA Site
Gaea <-> External
Gaea <-> Fairmont HPSS
External (Untrusted) Data Transfers
GCP
User Guide
Smartsites
CAC bastions refusing login attempts without asking for PIN
Shell hang on login
GPFS (F5) Performance
RDHPCS Cloud Computing
Parallel Works User Guide
NOAA’s Parallel Works Portal
Workflow
Getting Help
Training Videos
Beginner’s Guide to NOAA’s HPC Cloud
Parallel Works
Cloud Success Stories
Office Hours
Monthly Utilization Reports
FY2024 Usage
Frequently Asked Questions
General Issues
How do I close a Cloud project?
How do I request a project allocation or an allocation increase?
Storage functionalities
Parallel works
1. Clusters and snapshots
7. Configuration Questions
7. Slurm
8. Errors
9. Miscellaneous
Jet User Guide
GPU Clusters
About Modules
Using Math Libraries
Options for Editing on Jet
Starting a Parallel Application
Policies and Best Practices
System Software
Using OpenMP and Hybrid OpenMP/MPI on Jet
Hera User Guide
About NESCC
System Overview
System Configuration
Lustre File System Usage
Lustre Volume and File Count
Lustre
Hera Lustre Configuration
File Operations
Types of file I/O
File Striping
Userspace Commands
Applications and Libraries
Using Anaconda Python on Hera
MATLAB
Using IDL on Hera
Using ImageMagick on Hera
Using R on Hera
Libraries
Using Modules
Using MPI
Loading the MPI module
Using PGI and mvapich2
Tuning MPI (TBD)
Profiling an MPI application with Intel MPI
Debugging Codes
Debugging Intel MPI Applications
Application Debuggers
Invoking DDT on Hera with Intel IMPI
Profiling Codes
Linaro Forge
TAU
Managing Contrib Projects
Fine Grain Architecture (FGA) System
System Information
Getting an allocation for FGA resources
Using FGA resources without an allocation
User Environment
Compiling and Running Codes on the FGA
Compiling and Running Codes Using CUDA
Compiling and Running Codes Using Intel MPI
Compiling and Building Codes Using mvapich2-gdr Library
Compiling and Building Codes Using OpenMPI
Compiling codes with OpenACC directives on Hera
Compiling MPI codes with OpenACC directives on Hera
Submitting Batch Jobs to the FGA System
Hints on Rank Placement/Performance Tuning
Rank placement when using mvapich2
Using Nvidia Multi-Process Service
Compiling and Building Codes With The Cray Programming Environment
Some helpful web resources
Getting Help
Niagara User Guide
System Overview
Data Transfer
Per User Data Management on Niagara
Lustre File System Usage
Components
Configuration
File Operations
MSU-HPC User Guide
Introduction
MSU’s Official HPC Documentation
General Information
Logging In
Running Jobs on MSU-HPC Systems
Submitting a Job
Specifying a Partition
Monitoring Jobs
Getting Information about your Projects
MSU-HPC System Configuration
File Systems
Orion Compute System
Hercules Compute System
Account Management
Overview
MSU Account Management Policies
Managing Project and Role Account Members
NOAA Portfolio, Project, and User Management on MSU-HPC
Getting An Account
Account Renewal
Managing Portfolios, Projects and Allocation
Role Accounts
Help, Policies, Best Practices, Issues
MSU-HPC Help Requests
Policies and Best Practices
Protecting Restricted Data
MSU FAQ
HSMS HPSS User Guide
NESCC HPSS
Gaining Access to use HPSS
New HPSS User Requests
Adding New Projects to HPSS
NESCC HPSS Data Structure
Data Retention
Expired Data Deletion Process
PPAN User Guide
About Archrpt
Report Option [-r]
Summary Option [-s]
Group Quotas
User Quotas
Enforcing Quotas
Data Storage and Transfers
Summary of Storage Areas
Notes on User-Centric Data Storage
User Home Directories (NFS)
User Archive Directories (PAN Only)
Notes on Project-Centric Data Storage
Project Home Directories (NFS)
Project Work Areas
Project Archive Directories
NESCC HPSS
Gaining Access to use HPSS
New HPSS User Requests
Adding New Projects to HPSS
NESCC HPSS Data Structure
Portfolios Using HPSS
Data Retention
Expired Data Deletion Process
File Size Guidelines
Data Recovery Policy
Getting Started
HTAR
HSI
File Expiration Commands
Sample HPSS Batch Job
HPSS Help
GFDL Archive
Gaining Access to use the GFDL Archive
GFDL Archive Data Structure
Data Retention
Data Recovery Policy
Getting Started
Allocation and Quota
Finding Files
GFDL Archive Help
Data Transfer Overview
Data Transfer Methods
Globus
Data Transfer Nodes (DTNs)
Untrusted Data Transfer Nodes (UDTNs)
Port Tunnelling
Requests for Firewall Exceptions
Firewall Exception Terms
Transferring Data
Globus Connect
Trusted Data Transfer Nodes (DTN)
Untrusted Data Transfer Nodes (UDTN)
Transfer and Syntax Examples
Transfer a file on Hera to a destination on Jet
Globus transfer from an external endpoint to the GFDL untrusted endpoint
Firewall Modification Requests for DTNs
Example
Unattended Data Transfers or Password-less Transfers to/from RDHPCs Systems
Using a Pre-Established SSH Port Tunnel
SSH Port Tunnel from Linux-like systems
SSH Port Tunnel For PuTTy Windows Systems
SSH Port Tunnel For Tectia Windows Systems
WinSCP
Example
Globus Online Data Transfer
Overview
Example
RDHPCS Globus Collection Summary
NOAA RDHPCS Globus Endpoint Types
NOAA RDHPCS UDTN’s (Globus Untrusted Endpoint)
NOAA RDHPCS Object Stores in the Cloud
Globus Command Line Interface (CLI)
Transferring Data to and from Your Computer
Globus Example
What you need to have on hand
What you need to do
Using Globus Online Data Transfer
NOAA RDHPCS Globus Endpoint Types
NOAA RDHPCS UDTN’s (Globus Untrusted Endpoint)
NOAA RDHPCS Object Stores in the Cloud
Globus Command Line Interface (CLI)
Transferring Data to and from Your Computer
GFDL Data Services
GFDL Data Digital Object Identifier (DOI) Policy
Policies and Best Practices
System Usage
Login Node Usage
Cron Usage
Cron Job Frequency
Allocations
Request an Increase in Allocations
Adding a Project to an Allocation
Cloud Computing Allocations
Quotas
Requesting Additional Storage for a Project
File System Usage Practices and Policies
High Performance File System (HPFS - Scratch)
General Parallel File System (GPFS)
/data_untrusted
HFS
Filesystem Backup and Data Retention
Recover Recently Deleted Files from /home
HPSS (Data Retention)
Expired Data Deletion Process
Data Recovery Policy
Data Disposition
HPFS (Scratch) Data
Niagara Per User Data
Home File System (HFS) Data
Protecting Restricted Data
Managing Packages in
/contrib
Overview of
contrib
Packages
Responsibilities of a
contrib
Package Maintainer
contrib
Packages Guidelines
contrib
Package Maintainer Requests
Managing a
contrib
Package
Maintaining “Metadata” for
contrib
Packages
contrib
Package Directory Naming Conventions
Queue Policy
Overview
Specifying a Quality of Service (QOS)
Changing QOS’s
Jet and Hera
Gaea
General Recommendations
Priorities Between QOS
Debug & Batch QOS
Software
Modules
View Active Modules
Find Modules
Load Modules
Adding Additional Module Paths
Modules with sh, bash, and ksh scripts
Why doesn’t the module command work in shell scripts?
Command Summary
Python on RDHPCS Systems
Overview
Python Guides
Conda Basics
Installing Miniconda
Jupyter on RDHPCS Systems
Module Usage
Base Environment
Custom Environments
How to Run
RDHPCS Compute Nodes
Best Practices
Additional Resources
Rocoto
Jet, Hera, Gaea Rocoto User Requirements
Rocoto GitHub Documentation
Rocoto Best Practices
Rocoto Help
Debugging with Forge DDT
DDT remote connection
Debgging an MPI process
First time configuration
Submit a debug job
X2Go Remote Desktop
Requirements
Configure X2Go
Launch X2Go Session
X2Go Tips
Troubleshooting X2Go
X2Go Help Desk Requests
Tectia
Tectia Initial Setup procedure
Install and Configuring Tectia
Install the Tectia Client
Configure the Tectia Client
Port Tunnelling
Set Up Port Tunnelling
Testing Port Tunnels
Slurm
Overview
Running a Job
Batch Scripts
Interactive Jobs
Common
sbatch
Options
Slurm Environment Variables
State Codes
Job Reason Codes
Job Dependencies
Srun
Monitoring Jobs
Show Pending and Running Jobs
Show Completed Jobs
Getting Details About a Job
Priority and Fairshare
Understanding Slurm Fairshare
Fairshare Priority Factor
Fairshare Definitions
Fairshare Reporting
Priority Reporting
Getting Information About Your Projects
sfairshare
saccount_params
Generating Reports
Sreport
Shpcrpt
References
Gaea Batch Job Overview
Compiling
Running
Staging/Combining
Transferring Data to/from Gaea
Allocation
Running a Simple Job
Running the Script
Once the job is submitted
Once the job is Finished
FAQ
Frequently Asked Questions
Accounts
How Do I Get an RDHPCS Account?
Need help, my RSA Token is locked.
I forgot my passphrase, how do I reset it?
How do I use X11 appplication with shared user account (role account)?
Jobs
My job hasn’t started and I have been waiting a long time. What is wrong?
My job hasn’t started and it is in a reservation, what is wrong?
All my multi-node MPI jobs are timing out, even simple jobs! What is wrong?
My multi-node jobs fail on mpirun/mpiexec.
What is the meaning of the exit code?
User
How do I change my default login shell?
How can I recover recently deleted files from /home?
Why am I not able to ssh between nodes, it is asking me for a password!
How can I recover files that I accidentally deleted from my project space?
How to transfer small files to/from an RDHPCS system?
I can no longer transfer files via the port tunnel, please help!
Python
Can you please install the xyz python package(s)?
Why are my jobs failing intermittently?
Why am I getting these errors? I am using hpc-stack for NCEPLIBS
I am using spack-stack and getting some errors
When is my .bashrc executed? When would it be ignored?
Where can I find “Operational Data” from WCOSS2 on Hera?
My jobs using NCL are no longer working
Compile WRF on Hera/Jet with Rocky OS
How do I enable x11 forwarding using PowerShell on a Windows system?
Recent User-Facing Changes
Apr 29, 2024: The new LFS5 filesystem on Jet
Apr25, 2024: Rocoto updateto version rocoto/1.3.7 on Hera/Jet/Niagara
Apr 9, 2024: The aging uJet and tJet clusters are being turned off
Apr 2, 2024: Migration to Rocky8 in phases (Complete)
Mar 19, 2024: Migration to Rocky8 in phases
Current migration status
Feb 20, 2024: Migration to Rocky8 in phases
Jan 17, 2024: Rocoto updated to version 1.3.6
RDHPCS Office Hours
6 September 2024
20 June 2024
4 June 2024
10 May 2024
29 March 2024
15 March 2024
1 March 2024
New User Office Hour 28 Feb 2024
4 Jan 2024
15 December 2023
30 Nov 2023
Contributing to these docs
Overview
Process
Submitting suggestions
Contributing changes
GitHub Guidelines
Workflow for contributions to the documentation repository
Resources
Contributing via the CLI
Setup authoring environment
Edit the docs
Resources
NOAA RDHPCS User Documentation
Software
Edit on GitHub
Open an Issue on GitHub
Software
Modules
View Active Modules
Find Modules
Load Modules
Adding Additional Module Paths
Modules with sh, bash, and ksh scripts
Python on RDHPCS Systems
Overview
Python Guides
Module Usage
How to Run
Best Practices
Additional Resources
Rocoto
Jet, Hera, Gaea Rocoto User Requirements
Rocoto GitHub Documentation
Rocoto Best Practices
Rocoto Help
Debugging with Forge DDT
DDT remote connection
Debgging an MPI process
X2Go Remote Desktop
Requirements
Configure X2Go
Launch X2Go Session
X2Go Tips
Troubleshooting X2Go
X2Go Help Desk Requests