Skip to content

Math Slurm Docs (In Devel mode)

Requirements

  • Confirm with your PI that you have access to Slurm and you will be added to a PI-YEAR account
  • Confirm that you can ssh to stat.math.mcgill.ca or jump.math.mcgill.ca with your McGill account
  • Confirm that you have added a SSH key to your account on the systems

Partitions

  • gpu_h100_pro        : 2 x 96GB GPUs (Nvidia H100)
  • gpu_debug_nvidia : 8 x 11GB GPUs (4 x NVIDIA GeForce GTX 1080 Ti, 4 x NVIDIA GeForce RTX 2080 Ti)
  • gpu_debug_tesla   : 2 x 16GB GPUs (Tesla P100-PCIE-16GB)

Limits

Please use the QOS matching the partition above that you need to use

  • gpu_h100_pro        : 1 GPU, 24CPU, 128GB of RAM and 24 hour jobs
  • gpu_debug_nvidia : 1 GPU, 2CPU, 16GB of RAM and 1 week jobs
  • gpu_debug_tesla   : 1 GPU, 2CPU, 32GB of RAM and 48 hour jobs

Usage

  • Use the submit nodes (jump and stat) to submit and cancel your jobs
  • Modules avilable are different on submit vs gpu nodes
  • Use the GPU nodes to build your pyhon modules or load the existing minicom modules

Easy HowTo

The following shows how to connect to stat.math.mcgill.ca, load slurm and run a quick batch file on the Tesla GPUs

  • ssh mcgill-username@stat.math.mcgill.ca - Connect to stat.math.mcgill.ca
  • module load slurm - Load Slurm module
  • confirm you know your Account : PI-YEAR
  • srun -p gpu_debug_tesla -q gpu_debug_tesla -A PI-YEAR --mem=1GB -t 1:00:00 --ntasks=1 --gpus=1 batch.sh - Run your code/commands

Using VScode Remotely with Slurm

  • Confirm you have met the above requirements

Setup your Math account

  • ssh to stat.math.mcgill.ca or jump.math.mcgill.ca depending on whether you are on campus/VPN or not.
  • run the command: vscode-remote-setup

Sample output

  • Create your own vscode file eg. myvscode.sh (you can copy the contents below and customize )
    #!/bin/bash
    #
    #SBATCH -p gpu_debug_nvidia # partition use only the debug queues for VScode
    #SBATCH -c 2 # number of cores
    #SBATCH --mem=4G
    #SBATCH --gpus=1 # Make sure that number is within what is allowed by the QOS
    #SBATCH --propagate=NONE # IMPORTANT for long jobs
    #SBATCH -t 0-10:00 # time (D-HH:MM)
    #SBATCH -o slurm.%N.%j.out # STDOUT
    #SBATCH -e slurm.%N.%j.err # STDERR
    #SBATCH --qos=gpu_debug_nvidia # this should match the partition above with -p
    #SBATCH --account=PI-YEAR # Ask your TA/PI for which account to use
    #SBATCH --signal=B:TERM@60
    ### store relevant environment variables to a file in the home folder
    env | awk -F= '$1~/^(SLURM|CUDA|NVIDIA_)/{print "export "$0}' > ~/.slurm-envvar.bash
    
    module load dropbear # Necessary module to access slrum node
    
    cleanup() {
        echo "Caught signal - removing SLURM env file"
        rm -f ~/.slurm-envvar.bash
    }
    trap 'cleanup' SIGTERM
    
    ### start the dropbear SSH server
    # Make sure you change PORT_CHANGE_ME to the port given when you ran vscode-remote-setup
    dropbear \
        -r ~/.dropbear/server-key -F -E -w -s -p PORT_CHANGE_ME \
        -P ~/.dropbear/var/run/dropbear.pid
    

Make sure you updated the following in the above file:

  • PI-YEAR
  • PORT_CHANGE_ME (use value from vscode-remote-setup)

Now you are ready to submit your vscode job to enable remote access.

  • module load slurm - Load Slurm module
  • submit your job

    sbatch myvscode.sh
    

  • Take note of the above job id and run the following to get the NODE_NAME to access with VScode

    squeue --job Your_Job_ID
    

Setup your VScode machine

From the previous step you will need to know:

  • McGill_USERNAME
  • PORT_CHANGE_ME
  • NODE_NAME

Add the following to your ~/.ssh/config on your client/home machine, modifying the above values in the config

Host jump
    User McGill_USERNAME
    Hostname jump.math.mcgill.ca

Host math-slurm
    HostName NODE_NAME
    ProxyJump jump
    User McGill_USERNAME
    Port PORT_CHANGE_ME

You can now use VScode with the Remote-SSH Extension and select math-slurm as the remote host

  • Note you might need to enter your SSH passphrase twice if you have not added your ssh key to your agent
  • Note you will have to accept the ssh key the first time after setting up vscode-remote-setup

If you run into problems please send an email to science.it@mcgill.ca with the subject [SLURM]