Batch Jobs
Jobs are not run directly from the command line, the user needs to create a job script which specifies both resources, libraries and the job’s application that is to be run.
...
Before submitting a job script, be sure that you are using resources within limits. Check the limitations for KUIS AI users.
Job Scripts:
You can find job scripts inside /kuacc/jobscripts folder. You need to copy one of these scripts into your home directory(/kuacc/users/username/) and modify it according to your needs.
This is an example job script for KUACC HPC cluster. Note that a jobscript should start with #!/bin/bash.
Code Block |
---|
#!/bin/bash#SBATCHbash #SBATCH --job-name=Test #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --partition=short #SBATCH --qos=users #SBATCH --account=users #SBATCH --gres=gpu:tesla_t4:1 #SBATCH --time=1:0:0 #SBATCH --output=test-%j.out #SBATCH --mail-type=ALL#SBATCHALL #SBATCH --mail-user=foo@bar.com module load python/3.6.1moddule1 moddule load cuda/11.4module4 module load 8.2.2/cuda-11.4 python code.py |
Jobscript can be divided into three sections.:
Requesting resources
Loading library and application modules
Running your codes
Requesting Resources:
This section is where resources are requested and slurm parameters are configured. “#SBATCH” should always be used at the beginning of lines. Also, a flag is used for each request.
Code Block |
---|
#SBATCH <flag> <flag>#SBATCH#SBATCH --job-name=Test #Setting a job name#SBATCHname #SBATCH --nodes=1 #Asking for only one node#SBATCHnode #SBATCH --ntasks-per-node=1 #Asking one core on each node, one core core#SBATCH#SBATCH --partition=short #Running on short queue(max 2hours) #SBATCH --qos=users #Running on users qos (rules and limits) #SBATCH --account=users #Running on users partitions(group of nodes) #SBATCH --gres=gpu:tesla_t4:1 #Asking a tesla_t4 GPU GPU#SBATCH#SBATCH --time=1:0:0 #Reserving for one hour time limit. #SBATCH --output=test-%j.out #Setting a output file name. #SBATCH --mail-type=ALL #All types all emails (BEGIN, END, FAIL, ALL) #SBATCH --mail-user=foo@bar.com #Where to send emails |
...
Note that, KUACC HPC partitions are listed as below. You can see active partitions by sinfo command.
Name | MaxTimeLimit | Nodes | MaxJobs | MaxSubmitJob |
short | 2 hours | 50 nodes | 50 | 300 |
mid | 1 days | 45 nodes | 35 | 200 |
long | 7 days | 5 nodes | 25 | 100 |
longer | 30 days | 3 nodes | 5 | 50 |
ai | 7 days | 16 nodes | 8 | 100 |
ilac | Infinite | 12 nodes | Infinite | Infinite |
cosmos | Infinite | 8 nodes | Infinite | Infinite |
biyofiz | Infinite | 4 nodes | Infinite | Infinite |
cosbi | Infinite | 1 node | Infinite | Infinite |
kutem | Infinite | 1 node | Infinite | Infinite |
iui | Infinite | 1 node | Infinite | Infinite |
hamsi | Infinite | 1 node | Infinite | Infinite |
lufer | Infinite | 1 node | Infinite | Infinite |
shallowai | 7 days | 16 nodes | 8 | 100 |
biyofiz_gpu | Infinite | 4 nodes | Infinite | Infinite |
kutem_gpu | Infinite | 1 nodes | Infinite | Infinite |
Note that, following flags can be used in your job scripts.
Note: All the flag syntax starts with two dashes, because of the editor we use you can see –, which is not the case.
Resource | Flag Syntax | Description | Notes |
partition | –partition=short | Partition is a queue for jobs. | default on kuacc is short |
qos | –qos=users | QOS is quality of service value (limits or priority boost) | default on kuacc is users |
time | –time=01:00:00 | Time limit for the job. | 1 hour; default is 2 hours |
nodes | –nodes=1 | Number of compute nodes for the job. | default is 1 |
cpus/cores | –ntasks-per-node=4 | Corresponds to number of cores on the compute node. | default is 1 |
resource feature | –gres=gpu:1 | Request use of GPUs on compute nodes | default is no feature |
memory | –mem=4096 | Memory limit per compute node for the job. Do not use with mem-per-cpu flag. | default limit is 4096 MB per core |
memory | –mem-per-cpu=14000 | Per core memory limit. Do not use the mem flag, | default limit is 4096 MB per core |
account | –account=users | Users may belong to groups or accounts. | default is the user’s primary group. |
job name | –job-name=”hello_test” | Name of job. | default is the JobID |
constraint | –constraint=gpu | kuacc-nodes | AVAIL_FEATURES |
output file | –output=test.out | Name of file for stdout. | default is the JobID |
email address | –mail-user=username@ku.edu.tr | User’s email address | required |
email notification | –mail-type=ALL –mail-type=END | When email is sent to user. | omit for no email |
Note: –mem ve –mem-per-cpu flags:
...
Code Block |
---|
#SBATCH --ntasks=5#SBATCH5 #SBATCH –-mem-per-cpu=20000 |
Total 5×20000=100000MB is reserved. For GB requests, use only G. Exp: 20G
mem: total memory requested per node. If you request more than one node(N) Nxmem is reserved. Default units are megabytes.
Loading library and application modules:
Users need to load application and library modules needed for his/her code. As in sample job script,
Code Block |
---|
module load python/3.6.1module1 module load cuda/11.4module4 module load 8.2.2/cuda-11.4 |
For more information see the installing software modules page.
Running Code:
In this section of job script, users need to run his/her code.
...
Code Block |
---|
sbatch jobscript.sh |
Command | Description |
sbatch sbatch [script] | Submit a batch job Example: |
scancel scancel [job_id] | Kill a running job or cancel queued one Example: |
squeue squeue | List running or pending jobs Example: |
squeue -u userid squeue -u [userid] | List running or pending jobs Example: |