Nandadevi cluster has been partitioned into 8 partitions/queues using the latest opensource version of SLURM job scheduler. The details are as under.
SlNo | Queue Name | No of Cores | Purpose |
1 | nandaq | 20 nodes,28 cores pernode, 560 cores cluster (23 TF) | For Parallel jobs using MPI |
2 | nandaknlq | 4 nodes with Intel Xeon PHI 7230 Processor, 64 cores and 64GB RAM pernode, Total 256 cores, (10 TF) | For OpenMP,Manycore and Parallel Jobs |
3 | nandasq | 34 nodes,16 cores pernode, 544 cores with 64 GB memory (14 TF) | For multiple serial jobs and OpenMP applications |
4 | nandagpuq | 8 nodes having 16 Nvidia K20 GPU | For CUDA and OpenACC applications |
5 | nandaitraq | 2 nodes, 40 cores | Used by the ITRA project members |
6 | nandaphenoq | 2 nodes, 40 cores | Used by the PHENO project members |
7 | nandaifcq | 3 nodes with 10 Nividia K80 GPU | User by the IFC project members |
New Sample slurm file for submitting jobs to any one of the above queues. Download for MPI jobs, OMP jobs,Serial jobs,Check-Pointing
Do the necessary changes in this sample files and use it.
Updated Slurm Jobscript (Feb2021).
Ex: srun /home/username/proj1/a.out >& out_$SLURM_JOBID
Sample slurm file for submitting jobs to any one of the above queues. Download for MPI jobs, OMP jobs,Serial jobs,Check-Pointing
OLD JOBSCRIPT
#!/bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=28
#SBATCH -J <testrun>
# Available Partition names/Queuename are: 1) nandaq, 2) nandaknlq, 2)nandasq, 3)nandagpuq, 4)nandaphiq
#SBATCH -p <partitionname>
#SBATCH --export=all
#SBATCH --mail-user=<username>@imsc.res.in
#SBATCH --mail-type=ALL
cd $SLURM_SUBMIT_DIR
echo $SLURM_JOB_NODELIST > hostfile_$SLURM_JOBID
module load intel/2017
##OpenMP Case
## Make sure the ppn value and OMP_NUM_THREADS value are same or
## leave with SLURM_NTASKS env variable
### Only One Exectable allowed
export OMP_NUM_THREADS=$SLURM_NTASKS
<your executable> >& out_$SLURM_JOBID
##MPI Case
### Only One Exectable allowed
#mpirun -np <npvalue> <your/executable/with/path> >& out_$SLURM_JOBID
Job Submission
Create a pbs script using the above sample with suitable modification. Submit the job using the following command
sbatch slurm_script.sh
Once the queue is accepted by torque it will return a JOBID and that can be used for finding status of the job OR deleting the job
Job Status
The following command will display the status of the job.
squeue -a -n -1 [job id]It will display the status of all the jobs. The output will have Job ID, Username, Queue, Jobname SessID, NDS, TSK, S, Elap Time.
Single letter under the column 'S' will give you about the state of the job in queue. Details of each letter is given below
Q - job is queued, eligible to run or routed.
R - job is running.
E - Job is exiting after having run.
H - Job is held.
T - job is being moved to new location.
W - job is waiting for its execution time
Job Deletion
The following command will delete a job from the queue
scancel <jobid>PBS command like, qstat, qsub, qel also works