Skip to content

COMP Slurm Docs

Requirements

  • Confirm with your Professor or TA that your course has access to Slurm
  • Confirm that you have a active Computer Science account
  • Confirm that you can ssh to mimi.cs.mcgill.ca with SSH keys from your client machine

Resources

  • gpu-teach-01 : 10 x NVIDIA RTX A2000 12GB
  • gpu-teach-02 : 10 x NVIDIA RTX A2000 12GB
  • gpu-teach-03 :   4 x NVIDIA RTX    5000 32GB
  • gpu-grad-01   : 10 x NVIDIA RTX A5000 24GB
  • gpu-grad-02   :   8 x NVIDIA RTX A5000 24GB

Limits

(normal QOS, your account default may vary)

  • Jobs per user = 2
  • Maximum run time = 4 hours
  • Maximum CPU cores = 16

Easy HowTo

  • ssh cs-username@mimi.cs.mcgill.ca - Connect to a mimi node
  • module load slurm - Load Slurm module
  • srun -p all --mem=1GB -t 1:00:00 --ntasks=1 batch.sh - Run your code/commands

Sample srun

Info

Check state of the slurm cluster using: sinfo

Advanced Usage

  • ssh to mimi.cs.mcgill.ca
  • Create your own batch file eg. myfile.sh
    #!/bin/bash
    #
    #SBATCH -p all # partition (queue)
    #SBATCH -c 4 # number of cores
    #SBATCH --mem=4G
    #SBATCH --propagate=NONE # IMPORTANT for long jobs
    #SBATCH -t 0-2:00 # time (D-HH:MM)
    #SBATCH -o slurm.%N.%j.out # STDOUT
    #SBATCH -e slurm.%N.%j.err # STDERR
    #SBATCH --qos=QOS_FOR_COURSE_OR_PI # Ask your TA/PI which QOS to use
    #SBATCH --account=SEMESTER-COURSE # Ask your TA/PI for which account to use
    module load miniconda/miniconda-fall2024 # Load necessary modules
    #add your python runs, etc...
    
    Check [supported python Modules on the slurm nodes](modules.md)
    
    * `module load slurm` - Load Slurm module
    * submit your job
    ```sh
    sbatch myfile.sh
    

Using VScode Remotely with Slurm

  • Confirm you have met the above requirements

Setup mimi account

  • ssh to mimi.cs.mcgill.ca
  • On mimi run the command: vscode-remote-setup

Sample output

  • Create your own vscode file eg. myvscode.sh (you can copy the contents below and customize )
    #!/bin/bash
    #
    #SBATCH -p all # partition (queue)
    #SBATCH -c 4 # number of cores
    #SBATCH --mem=4G
    #SBATCH --gpus=1 # Make sure that number is within what is allowed by the QOS
    #SBATCH --propagate=NONE # IMPORTANT for long jobs
    #SBATCH -t 0-2:00 # time (D-HH:MM)
    #SBATCH -o slurm.%N.%j.out # STDOUT
    #SBATCH -e slurm.%N.%j.err # STDERR
    #SBATCH --qos=QOS_FOR_COURSE_OR_PI # Ask your TA/PI which QOS to use
    #SBATCH --account=SEMESTER-COURSE # Ask your TA/PI for which account to use
    #SBATCH --signal=B:TERM@60
    ### store relevant environment variables to a file in the home folder
    env | awk -F= '$1~/^(SLURM|CUDA|NVIDIA_)/{print "export "$0}' > ~/.slurm-envvar.bash
    
    module load dropbear # Necessary module to access slrum node
    
    cleanup() {
        echo "Caught signal - removing SLURM env file"
        rm -f ~/.slurm-envvar.bash
    }
    trap 'cleanup' SIGTERM
    
    ### start the dropbear SSH server
    # Make sure you change PORT_CHANGE_ME to the port given when you ran vscode-remote-setup
    dropbear \
        -r ~/.dropbear/server-key -F -E -w -s -p PORT_CHANGE_ME \
        -P ~/.dropbear/var/run/dropbear.pid
    

Make sure you updated the following in the above file:

  • QOS_FOR_COURSE_OR_PI
  • SEMESTER-COURSE
  • PORT_CHANGE_ME (use value from vscode-remote-setup)

Now you are ready to submit your vscode job to enable remote access.

  • module load slurm - Load Slurm module
  • submit your job

    sbatch myvscode.sh
    

  • Take note of the above job id and run the following to get the NODE_NAME to access with VScode

    squeue --job Your_Job_ID
    

Setup your VScode machine

From the previous step you will need to know:

  • CS_USERNAME
  • PORT_CHANGE_ME
  • NODE_NAME

Add the following to your ~/.ssh/config on your client/home machine, modifying the above values in the config

Host mimi
    User CS_USERNAME
    Hostname mimi.cs.mcgill.ca

Host cs-slurm
    HostName NODE_NAME
    ProxyJump mimi
    User CS_USERNAME
    Port PORT_CHANGE_ME

You can now use VScode with the Remote-SSH Extension and select cs-slurm as the remote host

  • Note you might need to enter your SSH passphrase twice if you have not added your ssh key to your agent
  • Note you will have to accept the ssh key the first time after setting up vscode-remote-setup

If you run into problems please send an email to science.it@mcgill.ca with the subject [SLURM]