Nandadevi Job Scheduler (Slurm)

         Nandadevi cluster has been partitioned into 8 partitions/queues using the latest opensource version of SLURM job scheduler. The details are as under.

SlNo Queue Name No of Cores Purpose
1 nandaq 20 nodes,28 cores pernode, 560 cores cluster (23 TF) For Parallel jobs using MPI
2 nandaknlq 4 nodes with Intel Xeon PHI 7230 Processor, 64 cores and 64GB RAM pernode, Total 256 cores, (10 TF) For OpenMP,Manycore and Parallel Jobs
3 nandasq 34 nodes,16 cores pernode, 544 cores with 64 GB memory (14 TF) For multiple serial jobs and OpenMP applications
4 nandagpuq 8 nodes having 16 Nvidia K20 GPU For CUDA and OpenACC applications
5 nandaitraq 2 nodes, 40 cores Used by the ITRA project members
6 nandaphenoq 2 nodes, 40 cores Used by the PHENO project members
7 nandaifcq 3 nodes with 10 Nividia K80 GPU User by the IFC project members

New Sample slurm file  for submitting jobs to any one of the above queues. Download for MPI jobs, OMP jobs,Serial jobs,Check-Pointing

Do the necessary changes in this sample files and use it.

Updated Slurm Jobscript (Feb2021).

#SBATCH --job-name=<Your_jobname>
##Available Partitions
#SBATCH --partition=nandaq
#SBATCH --output=Job.%j.out
#SBATCH --error=Job.%j.err
#SBATCH --export=all
#SBATCH --mail-user=<username>
#SBATCH --mail-type=ALL
#SBATCH -D </Working_dir_path usually /lustre/username/...>
# Load your modules in the script.
module load module_name 
module unload module_name
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINE_FILE
srun <your executable>  >& out_$SLURM_JOBID
##MPI Case
###   Only One Exectable allowed
#mpirun -np <npvalue> <your/executable/with/path>  >& out_$SLURM_JOBID
Note: Please give a full path of the executable file, if the executable file are in different location
Ex: srun /home/username/proj1/a.out >& out_$SLURM_JOBID

Sample slurm file  for submitting jobs to any one of the above queues. Download for MPI jobsOMP jobs,Serial jobs,Check-Pointing


#SBATCH --ntasks-per-node=28
#SBATCH -J <testrun>
# Available Partition names/Queuename are: 1) nandaq, 2) nandaknlq, 2)nandasq, 3)nandagpuq, 4)nandaphiq
#SBATCH -p <partitionname>
#SBATCH --export=all
#SBATCH --mail-user=<username>
#SBATCH --mail-type=ALL


module load intel/2017
##OpenMP Case
## Make sure the ppn value and OMP_NUM_THREADS value are same or
## leave with SLURM_NTASKS env variable
###   Only One Exectable allowed
<your executable> >& out_$SLURM_JOBID

##MPI Case
###   Only One Exectable allowed
#mpirun -np <npvalue> <your/executable/with/path>  >& out_$SLURM_JOBID


Job Submission

      Create a pbs script using the above sample with suitable modification. Submit the job using the following command


Once the queue is accepted by torque it will return a JOBID and that can be used for finding status of the job OR deleting the job


Job Status

The following command will display the status of the job.

            squeue -a -n -1 [job id]

It will display the status of all the jobs. The output will have Job ID, Username, Queue, Jobname SessID, NDS, TSK, S, Elap Time.

Single letter under the column 'S' will give you about the state of the job in queue. Details of each letter is given below

 Q - job is queued, eligible to run or routed.

 R - job is running.

 E - Job is exiting after having run.

 H - Job is held.

 T - job is being moved to new location.

 W - job is waiting for its execution time


Job Deletion

The following command will delete a job from the queue

              scancel <jobid>

PBS command like, qstat, qsub, qel also works

Zircon - This is a contributing Drupal Theme
Design by WeebPal.