Page Comparison

Batch Jobs

Jobs are not run directly from the command line, the user needs to create a job script which specifies both resources, libraries and the job’s application that is to be run.

...

Before submitting a job script, be sure that you are using resources within limits. Check the limitations for KUIS AI users.

Job Scripts:

You can find job scripts inside /kuacc/jobscripts folder. You need to copy one of these scripts into your home directory(/kuacc/users/username/) and modify it according to your needs.

This is an example job script for KUACC HPC cluster. Note that a jobscript should start with #!/bin/bash.

Code Block

#!/bin/bash#SBATCHbash

#SBATCH --job-name=Test            
#SBATCH --nodes=1        
#SBATCH --ntasks-per-node=1    
#SBATCH --partition=short        
#SBATCH --qos=users        
#SBATCH --account=users    
#SBATCH --gres=gpu:tesla_t4:1    
#SBATCH --time=1:0:0        
#SBATCH --output=test-%j.out    
#SBATCH --mail-type=ALL#SBATCHALL
#SBATCH --mail-user=foo@bar.com     

module load python/3.6.1moddule1
moddule load cuda/11.4module4
module load 8.2.2/cuda-11.4     

python code.py

Jobscript can be divided into three sections.:

Requesting resources
Loading library and application modules
Running your codes

Requesting Resources:

This section is where resources are requested and slurm parameters are configured. “#SBATCH” should always be used at the beginning of lines. Also, a flag is used for each request.

Code Block

#SBATCH <flag>
<flag>#SBATCH#SBATCH --job-name=Test                                 #Setting a job name#SBATCHname
#SBATCH --nodes=1                                       #Asking for only one node#SBATCHnode
#SBATCH --ntasks-per-node=1                             #Asking one core on each node, one core
core#SBATCH#SBATCH --partition=short                               #Running on short queue(max 2hours)
#SBATCH --qos=users                                     #Running on users qos (rules and limits)
#SBATCH --account=users                                 #Running on users partitions(group of nodes)
#SBATCH --gres=gpu:tesla_t4:1                           #Asking a tesla_t4 GPU
GPU#SBATCH#SBATCH --time=1:0:0                                    #Reserving for one hour time limit.
#SBATCH --output=test-%j.out                            #Setting a output file name.
#SBATCH --mail-type=ALL                                 #All types all emails (BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=foo@bar.com                         #Where to send emails

...

Note that, KUACC HPC partitions are listed as below. You can see active partitions by sinfo command.

Name	MaxTimeLimit	Nodes	MaxJobs	MaxSubmitJob
short	2 hours	50 nodes	50	300
mid	1 days	45 nodes	35	200
long	7 days	5 nodes	25	100
longer	30 days	3 nodes	5	50
ai	7 days	16 nodes	8	100
ilac	Infinite	12 nodes	Infinite	Infinite
cosmos	Infinite	8 nodes	Infinite	Infinite
biyofiz	Infinite	4 nodes	Infinite	Infinite
cosbi	Infinite	1 node	Infinite	Infinite
kutem	Infinite	1 node	Infinite	Infinite
iui	Infinite	1 node	Infinite	Infinite
hamsi	Infinite	1 node	Infinite	Infinite
lufer	Infinite	1 node	Infinite	Infinite
shallowai	7 days	16 nodes	8	100
biyofiz_gpu	Infinite	4 nodes	Infinite	Infinite
kutem_gpu	Infinite	1 nodes	Infinite	Infinite

Note that, following flags can be used in your job scripts.

Note: All the flag syntax starts with two dashes, because of the editor we use you can see –, which is not the case.

Resource	Flag Syntax	Description	Notes
partition	–partition=short	Partition is a queue for jobs.	default on kuacc is short
qos	–qos=users	QOS is quality of service value (limits or priority boost)	default on kuacc is users
time	–time=01:00:00	Time limit for the job.	1 hour; default is 2 hours
nodes	–nodes=1	Number of compute nodes for the job.	default is 1
cpus/cores	–ntasks-per-node=4	Corresponds to number of cores on the compute node.	default is 1
resource feature	–gres=gpu:1	Request use of GPUs on compute nodes	default is no feature
memory	–mem=4096	Memory limit per compute node for the job. Do not use with mem-per-cpu flag.	default limit is 4096 MB per core
memory	–mem-per-cpu=14000	Per core memory limit. Do not use the mem flag,	default limit is 4096 MB per core
account	–account=users	Users may belong to groups or accounts.	default is the user’s primary group.
job name	–job-name=”hello_test”	Name of job.	default is the JobID
constraint	–constraint=gpu	kuacc-nodes	AVAIL_FEATURES
output file	–output=test.out	Name of file for stdout.	default is the JobID
email address	–mail-user=username@ku.edu.tr	User’s email address	required
email notification	–mail-type=ALL –mail-type=END	When email is sent to user.	omit for no email

Note: –mem ve –mem-per-cpu flags:

...

Code Block
#SBATCH --ntasks=5#SBATCH5 #SBATCH –-mem-per-cpu=20000

Total 5×20000=100000MB is reserved. For GB requests, use only G. Exp: 20G

mem: total memory requested per node. If you request more than one node(N) Nxmem is reserved. Default units are megabytes.

Loading library and application modules:

Users need to load application and library modules needed for his/her code. As in sample job script,

Code Block
module load python/3.6.1module1 module load cuda/11.4module4 module load 8.2.2/cuda-11.4

For more information see the installing software modules page.

Running Code:

In this section of job script, users need to run his/her code.

...

Code Block
sbatch jobscript.sh

Command

Description

sbatch

sbatch [script]

Submit a batch job

Example:
$ sbatch job.sub

scancel

scancel [job_id]

Kill a running job or cancel queued one

Example:
$ scancel 123456

squeue

squeue

List running or pending jobs

Example:
$ squeue

squeue -u userid

squeue -u [userid]

List running or pending jobs

Example:
$ squeue -u john

Version	Old Version 1	New Version Current
Changes made by	Emirhan Akman	Emirhan Akman
Saved on	Feb 07, 2024	Feb 07, 2024

Versions Compared

Key

Job Scripts:

Requesting Resources:

Loading library and application modules:

Running Code: