COMP Slurm Docs
Requirements
- Confirm with your Professor or TA that your course has access to Slurm
- Confirm that you have a active Computer Science account
- Confirm that you can ssh to mimi.cs.mcgill.ca with SSH keys from your client machine
Resources
- gpu-teach-01 : 10 x NVIDIA RTX A2000 12GB
- gpu-teach-02 : 10 x NVIDIA RTX A2000 12GB
- gpu-teach-03 : 4 x NVIDIA RTX 5000 32GB
- gpu-grad-01 : 10 x NVIDIA RTX A5000 24GB
- gpu-grad-02 : 8 x NVIDIA RTX A5000 24GB
Limits
(normal QOS, your account default may vary)
- Jobs per user = 2
- Maximum run time = 4 hours
- Maximum CPU cores = 16
Easy HowTo
ssh cs-username@mimi.cs.mcgill.ca
- Connect to a mimi nodemodule load slurm
- Load Slurm modulesrun -p all --mem=1GB -t 1:00:00 --ntasks=1 batch.sh
- Run your code/commands
Info
Check state of the slurm cluster using: sinfo
Advanced Usage
- ssh to mimi.cs.mcgill.ca
- Create your own batch file eg. myfile.sh
#!/bin/bash # #SBATCH -p all # partition (queue) #SBATCH -c 4 # number of cores #SBATCH --mem=4G #SBATCH --propagate=NONE # IMPORTANT for long jobs #SBATCH -t 0-2:00 # time (D-HH:MM) #SBATCH -o slurm.%N.%j.out # STDOUT #SBATCH -e slurm.%N.%j.err # STDERR #SBATCH --qos=QOS_FOR_COURSE_OR_PI # Ask your TA/PI which QOS to use #SBATCH --account=SEMESTER-COURSE # Ask your TA/PI for which account to use module load miniconda/miniconda-fall2024 # Load necessary modules #add your python runs, etc... Check [supported python Modules on the slurm nodes](modules.md) * `module load slurm` - Load Slurm module * submit your job ```sh sbatch myfile.sh
Using VScode Remotely with Slurm
- Confirm you have met the above requirements
Setup mimi account
- ssh to mimi.cs.mcgill.ca
- On mimi run the command: vscode-remote-setup
- Create your own vscode file eg. myvscode.sh (you can copy the contents below and customize )
#!/bin/bash # #SBATCH -p all # partition (queue) #SBATCH -c 4 # number of cores #SBATCH --mem=4G #SBATCH --gpus=1 # Make sure that number is within what is allowed by the QOS #SBATCH --propagate=NONE # IMPORTANT for long jobs #SBATCH -t 0-2:00 # time (D-HH:MM) #SBATCH -o slurm.%N.%j.out # STDOUT #SBATCH -e slurm.%N.%j.err # STDERR #SBATCH --qos=QOS_FOR_COURSE_OR_PI # Ask your TA/PI which QOS to use #SBATCH --account=SEMESTER-COURSE # Ask your TA/PI for which account to use #SBATCH --signal=B:TERM@60 ### store relevant environment variables to a file in the home folder env | awk -F= '$1~/^(SLURM|CUDA|NVIDIA_)/{print "export "$0}' > ~/.slurm-envvar.bash module load dropbear # Necessary module to access slrum node cleanup() { echo "Caught signal - removing SLURM env file" rm -f ~/.slurm-envvar.bash } trap 'cleanup' SIGTERM ### start the dropbear SSH server # Make sure you change PORT_CHANGE_ME to the port given when you ran vscode-remote-setup dropbear \ -r ~/.dropbear/server-key -F -E -w -s -p PORT_CHANGE_ME \ -P ~/.dropbear/var/run/dropbear.pid
Make sure you updated the following in the above file:
- QOS_FOR_COURSE_OR_PI
- SEMESTER-COURSE
- PORT_CHANGE_ME (use value from vscode-remote-setup)
Now you are ready to submit your vscode job to enable remote access.
module load slurm
- Load Slurm module-
submit your job
-
Take note of the above job id and run the following to get the NODE_NAME to access with VScode
Setup your VScode machine
From the previous step you will need to know:
- CS_USERNAME
- PORT_CHANGE_ME
- NODE_NAME
Add the following to your ~/.ssh/config on your client/home machine, modifying the above values in the config
Host mimi
User CS_USERNAME
Hostname mimi.cs.mcgill.ca
Host cs-slurm
HostName NODE_NAME
ProxyJump mimi
User CS_USERNAME
Port PORT_CHANGE_ME
You can now use VScode with the Remote-SSH Extension and select cs-slurm as the remote host
- Note you might need to enter your SSH passphrase twice if you have not added your ssh key to your agent
- Note you will have to accept the ssh key the first time after setting up vscode-remote-setup
If you run into problems please send an email to science.it@mcgill.ca with the subject [SLURM]