Warning

This command is not yet implemented on MonARCH. Please check the message of the day (MOTD) displayed at login to see which system commands are currently implemented on MonARCH.

Checking job status

There are two methods to check your job status.

Method 1: show_job

We provide a show_job script. This script groups information, filters, sorts, and provides statistics to provide a clean, tidy, and user-friendly output.

show_job 3000558
-----------------------------------------------------------------------------------
JobID                       3000558
USERID                      smichnow
USER Name                   Simon Michnowicz (Monash University)
Email
 -----------------------------------------------------------------------------------
Job Name                    testV2feature
Project                     general
Partition                   comp
QoS                         normal
Job State                   PENDING
Why cant Run                Resources
Running Time                00:00:00
Total Time                  00:05:00
Submit Host                 monarch-dtn
Submit Time                 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource                Node=1
                          NumCPUs=16
                          CPUsPerTask=1
                          CPUsPerNode=1
                          MemoryPerNode=1000M
                          Constraint=Xeon-E5-2680-v3
 ----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------

Hint

To check the status of a single job use show_job [JOBID].

Method 2: Slurm commands

To display all of your running/pending jobs use squeue -u `whoami`.

Hint

whoami returns your MonARCH username, and is a handy shortcut.

$ squeue -u `whoami`
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

If you want to view the status of a single job

$ scontrol show job [JOBID]

squeue Status Codes and Reasons

The squeue command details a variety of information on an active job’s status with state and reason codes. Job state codes describe a job’s current state in queue (e.g. pending, completed). Job reason codes describe the reason why the job is in its current state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

squeue status codes

Status

Code

Explanation

COMPLETED

CD

The job has completed successfully.

COMPLETING

CG

he job is finishing but some processes are still active.

FAILED

F

The job terminated with a non-zero exit code and failed to execute.

PENDING

PD

The job is waiting for resource allocation. It will eventually run.

PREEMPTED

PR

The job was terminated because of preemption by another job.

RUNNING

R

The job currently is allocated to a node and is running.

SUSPENDED

S

A running job has been stopped with its cores released to other jobs.

STOPPED

ST

A running job has been stopped with its cores retained.

Job Reason Codes

Reason Code

Explanation

Priority

One or more higher priority jobs is in queue for running. Your job will eventually run.

Dependency

This job is waiting for a dependent job to complete and will run afterwards.

Resources

The job is waiting for resources to become available and will eventually run.

InvalidAccount

The job’s account is invalid. Cancel the job and rerun with correct account.

InvaldQoS

The job’s QoS is invalid. Cancel the job and rerun with correct account.

QOSGrpCpuLimit

All CPUs assigned to your job’s specified QoS are in use; job will run eventually.

QOSGrpMaxJobsLimit

Maximum number of jobs for your job’s QoS have been met; job will run eventually.

QOSGrpNodeLimit

All nodes assigned to your job’s specified QoS are in use; job will run eventually.

PartitionCpuLimit

All CPUs assigned to your job’s specified partition are in use; job will run eventually.

PartitionMaxJobsLimit

Maximum number of jobs for your job’s partition have been met; job will run eventually.

PartitionNodeLimit

All nodes assigned to your job’s specified partition are in use; job will run eventually.