Python on RDHPCS Systems
Overview
In high-performance computing (HPC), Python is an essential tool for analyzing scientific data. Many users need specific versions of Python or specialized scientific packages for their analyses, and these often come with a range of dependencies. Managing different Python installations can be problematic, particularly in the complex environment of HPC systems. Virtual environments are a crucial solution, effectively isolating package installations into distinct directories.
While Python includes a native virtual environment feature called venv, Conda stands out as a powerful package and environment manager. Conda empowers users to effortlessly install various binary software packages and the necessary libraries, enabling the creation of isolated Python environments without the hassle of conflicting dependencies or complications from other Python installations. Conda is fully supported on all RDHPCS systems.
Caution
The RDHPCS does not have a license with the Anaconda Python distribution. As the NOAA RDHPCS systems do not fit within the 200-employee limit as defined in Anaconda Terms of Service, use of the Anaconda, which includes Miniconda, on RDHPCS systems is prohibited.
For more information, please refer to the Anaconda Terms of Service and Anaconda’s blog posting Update on Anaconda’s Terms of Service for Academia and Research.
Note
The only conda channel approved for use on the NOAA RDHPCS systems is conda-forge. The conda-forge installer, Miniforge, includes the conda package manager and will use the conda-forge channel.
If you want to leverage Python with Jupyter, we direct you to our Jupyter on RDHPCS Systems page for comprehensive guidance.
Note
The RDHPCS is diligently working to implement a unified Python/Conda configuration and policies across all NOAA-managed RDHPCS systems (Hera, Jet, Niagara, Pan). Rest assured, this documentation will be updated as these configurations and policies are implemented.
Python Guides
Explore our guides designed to empower you in using Python and Conda on RDHPCS systems:
- Conda Basics Guide:
Master the essential workflow and commands of Conda to enhance your productivity.
- Installing Miniforge Guide:
Get step-by-step instructions for installing Miniconda on RDHPCS systems.
- Jupyter on RDHPCS Systems:
Access detailed directions for installing and utilizing JupyterLab on RDHPCS systems.
Note
If you’re new to Conda, don’t miss our Conda Basics Guide. It’s the perfect starting point, providing you with a handy quick-reference list of commands to accelerate your learning.
Module Usage
Python
To start using Python, load the python
module.
$ module use /usw/conda/modulefiles
$ module load python
$ module load python
$ module load python
$ module load python
$ module load python
Run the module avail python
command to see the available versions of
Python.
Conda
Some RDHPCS systems have Conda installed for all users. To start using Conda on these systems, add the module file path to modules, and load the module.
$ module use /usw/conda/modulefiles
$ module load miniforge
$ module load miniforge
Conda Environments
Some RDHPCS systems offer Conda for all users. The maintainers have created several environments besides the base one. If those don’t work for you, create your own custom environment.
Base Environment
At the heart of every Conda installation is the base environment, which comes equipped with the Conda package manager and a selection of additional packages.
Loading the conda module will activate the base environment. This option is ideal for users who don’t require custom environments or who simply need a Python interpreter.
To explore the full range of packages included in the base environment, just
use the command conda list
.
$ conda list
# packages in environment at ...:
#
# Name Version Build Channel
_ipyw_jlab_nb_ext_conf 0.1.0 py38_0
_libgcc_mutex 0.1 main
alabaster 0.7.12 py_0
anaconda 2020.07 py38_0
anaconda-client 1.7.2 py38_0
anaconda-project 0.8.4 py_0
asn1crypto 1.3.0 py38_0
astroid 2.4.2 py38_0
astropy 4.0.1.post1 py38h7b6447c_1
.
.
.
Warning
It is not recommended to try to install new packages into the base environment. Instead, you can either clone the base environment for yourself and install packages into the clone, or create a brand new (empty) environment and install packages into it. An example for cloning the base environment is provided in Best Practices below, while creating new environments is covered directly below in Custom Environments.
Custom Environments
After loading the Python or Conda module, you can create custom environments tailored to
your specific requirements. This is particularly beneficial if you need a
specific version of Python or packages. This can be accomplished using either
conda
or Python’s built-in venv functionality.
Note
The Conda Basics Guide provides a list of conda commands. Python’s Official Documentation provides detailed instructions on using venv.
To create and activate an environment:
#1. Create the "my_env" environment with Python version X.Y
$ conda create --name my_env python=X.Y
#2. Activate "my_env"
$ conda activate /path/to/my_env
#3. Install additional packages in the "my_env" environment
$ conda install <package_name> [<package_name> ...]
#1. Create the virtual environment in the desired path
$ python -m venv /path/to/my_env
#2. Activate the virtual environment
$ source /path/to/my_env/bin/activate
#3. Install additional packages
$ pip install <package_name> [<package_name> ...]
Following these procedures enables efficient management of package dependencies and Python versions tailored to project needs.
To ensure optimal performance and collaboration on your project, we highly recommend creating new environments in the “Project Home” directory (refer to the file system summary). This approach not only prevents potential purges but also enhances teamwork within your project and interacts seamlessly with the compute nodes. For added convenience, please use environment names that reflect the hostname; this practice is crucial, as virtual environments designed on one system may not operate correctly on others.
Moreover, always remember to deactivate your current environment before switching to a new one. You can easily deactivate an environment by using the following command:
$ conda deactivate
$ deactivate
How to Run
Warning
Remember, at larger scales both your performance and your fellow users’ performance will suffer if you do not run on the compute nodes. It is always highly recommended to run on the compute nodes (through the use of a batch job or interactive batch job).
The OS-provided Python is no longer accessible as python
(including
variations like /usr/bin/python
or /usr/bin/env python
); rather, you
must specify it as python2
or python3
. If you are using python from one
of the module files rather than the version in /usr/bin
, this change should
not affect how you invoke python in your scripts, although we encourage
specifying python2
or python3
as a best practice.
RDHPCS Compute Nodes
Before jumping into batch scripts, remember to check out the Module Usage section first, which details the differences between Python modules and environments on our different systems.
Batch Script
On most RDHPCS systems, you are already on a compute node once you are in a
batch job. Therefore, you only need to use srun
if you plan to run
parallel-enabled Python, and you do not need to specify srun
if you are
running a serial application.
Additionally, $PATH
issues are known to occur if not submitting from a
fresh login shell, which can result in the wrong environment being detected. To
avoid this, you must use the --export=NONE
flag, which ensures that no
previously set environment variables are passed into the batch job:
$ sbatch --export=NONE submit.sl
This means you will have to load your modules and activate your environment inside the batch script. An example batch script for is provided below:
#!/bin/bash
#SBATCH -A <PROJECT_ID>
#SBATCH -J python
#SBATCH -N 1
#SBATCH -p batch
#SBATCH -t 0:05:00
cd $SLURM_SUBMIT_DIR
date
module load python
conda activate my_env
python3 script.py
#!/bin/bash
#SBATCH -A <PROJECT_ID>
#SBATCH -J python
#SBATCH -N 1
#SBATCH -p batch
#SBATCH -t 0:05:00
cd $SLURM_SUBMIT_DIR
date
module load python
conda activate my_env
python3 script.py
#!/bin/bash
#SBATCH -A <PROJECT_ID>
#SBATCH -J python
#SBATCH -N 1
#SBATCH -p batch
#SBATCH -t 0:05:00
cd $SLURM_SUBMIT_DIR
date
module load python
conda activate my_env
python3 script.py
#!/bin/bash
#SBATCH -A <PROJECT_ID>
#SBATCH -J python
#SBATCH -N 1
#SBATCH -p batch
#SBATCH -t 0:05:00
cd $SLURM_SUBMIT_DIR
date
module load python
conda activate my_env
python3 script.py
#!/bin/bash
#SBATCH -A <PROJECT_ID>
#SBATCH -J python
#SBATCH -N 1
#SBATCH -p batch
#SBATCH -t 0:05:00
cd $SLURM_SUBMIT_DIR
date
module load python
conda activate my_env
python3 script.py
Interactive Job
To use Python in an interactive session on RDHPCS systems:
$ salloc -A <PROJECT_ID> -N 1 -t 0:05:00
$ module load python
$ conda activate my_env
$ python3 script.py
$ salloc -A <PROJECT_ID> -N 1 -t 0:05:00
$ module load python
$ conda activate my_env
$ python3 script.py
$ salloc -A <PROJECT_ID> -N 1 -t 0:05:00
$ module load python
$ conda activate my_env
$ python3 script.py
$ salloc -A <PROJECT_ID> -N 1 -t 0:05:00
$ module load python
$ conda activate my_env
$ python3 script.py
$ salloc -A <PROJECT_ID> -N 1 -t 0:05:00
$ module load python
$ conda activate my_env
$ python3 script.py
When in an interactive job, if you want to use an interactive Python prompt and
srun
at the same time, use the --pty
flag (useful when running with
multiple tasks):
$ srun --pty python3
Best Practices
Cloning the base environment using conda:
It is not recommended to try to install new packages into the base environment. Instead, you can clone the base environment for yourself and install packages into the clone. To clone an environment, you must use the
--clone <env_to_clone>
flag when creating a new conda environment. An example for cloning the base environment into a specific directory calledenvs/
in your “Project Home” is provided below:$ conda create -p <project_home>/<project_id>/<user_id>/envs/baseClone --clone base $ conda activate <project_home>/<project_id>/<user_id>/envs/baseClone
Cloning the “base environment” using venv:
$ python3 -m venv /path/to/new_env --system-site-packages
Environment locations (storage):
It is highly recommended to create new environments in the “Project Home” directory (see file system summary). This space avoids purges, allows for potential collaboration within your project, and works better with the compute nodes.
Adding known conda environment locations:
For a conda environment to be callable by a “name”, it must be installed in one of the
envs_dirs
directories. The list of known directories can be seen by executing:$ conda config --show envs_dirs
On RDHPCS systems, the default location is your
$HOME
directory. If you plan to frequently create environments in a different location other than the default, then there is an option to add directories to theenvs_dirs
list.For example, to track conda environments in a subdirectory in the Project Home you would execute:
$ conda config --append envs_dirs <project_home>/<project_id>/<user_id>/envs
This will create a
.condarc
file in your$HOME
directory if you do not have one already, which will now contain this new envs_dirs location. This will now enable you to use the--name env_name
flag when using conda commands for environments stored in the personal environments directory, instead of having to use the-p <project_home>/<project_id>/<user_id>/envs/env_name
flag and specifying the full path to the environment. For example, you can doconda activate my_env
instead ofconda activate <project_home>/<project_id>/<user_id>/envs/my_env
.Make note of and clean your pip cache location:
To avoid quota issues, it is highly recommended to occasionally clean your
pip
cache location.To find where your cache location is, use:
$ pip cache info
To clean your cache, use:
$ pip cache purge
Clean your conda cache:
To avoid quota issues, it is highly recommended to occasionally clean your
conda
cache location (in your.conda
directory). To do so, run:$ conda clean -a
Deactivate your environments before running batch jobs:
To avoid
$PATH
issues, it is highly recommended to submit batch jobs or enter interactive jobs without an already activated environment – so, deactivating your environment is recommended. Alternatively, you can submit your jobs from a fresh login shell.Unbuffered input:
To enable unbuffered input when running Python jobs or scripts on our systems, it is recommended to use the
-u
flag. For example:$ python3 -u script.py