Checking the status of MonARCH

On MonARCH , users can check the status of all nodes via the show_cluster command. The output of this command should be similar to:

show_cluster
NODE              TYPE         PARTITION  Mem (GB)       GPU         STATUS
                                            (Free)    (Free)
gp00              P100              comp        26       181         0        Running
gp01              P100              comp        27       209         1        Running
gp02              P100              comp        28       236         2           Idle
gp03              P100              comp        28       236         2           Idle
gp04              P100              comp        28       236         2           Idle
gp05              P100              comp        28       236         2           Idle
hc00               CPU              comp        24        98         0           Idle
hs00               CPU              comp        16        98         0           Idle
hs01               CPU              comp        16        98         0           Idle
hs02               CPU              comp        16        98         0           Idle
hs03               CPU              comp        16        98         0           Idle
hs04               CPU              comp        16        98         0           Idle
hs05               CPU              comp        16        98         0           Idle
mi00               CPU              comp        36       155         0           Idle
mi01               CPU              comp        36       155         0           Idle
mi02               CPU              comp        36       155         0           Idle
mi03               CPU              comp        36       155         0           Idle
mi04               CPU              comp        36       155         0           Idle
mi05               CPU              comp        36       155         0           Idle
mi06               CPU              comp        36       155         0           Idle
mi07               CPU              comp        36       155         0           Idle
mi08               CPU              comp        36       155         0           Idle
mi09               CPU              comp        36       155         0           Idle
mi10               CPU              comp        36       155         0           Idle
mi11               CPU              comp        36       155         0           Idle

                                          Summary:
 +------------+-------------+------------+------------+------------+-------------+-------------+
 |            | Cores       | Nodes      | K1 GPUs    | K80 GPUs   | P100 GPUs   | V100 GPUs   |
 |------------+-------------+------------+------------+------------+-------------+-------------|
 | Available  | 717  (100%) | 23   (92%) | 0    ( 0%) | 0    ( 0%) | 9    (75%)  | 0    ( 0%)  |
 | In Use     | 3    ( 0%)  | 2    ( 8%) | 0    ( 0%) | 0    ( 0%) | 3    (25%)  | 0    ( 0%)  |
 | Down       | 0    ( 0%)  | 0    ( 0%) | 0    ( 0%) | 0    ( 0%) | 0    ( 0%)  | 0    ( 0%)  |
 | Reserved   | 0    ( 0%)  | 0    ( 0%) | 0    ( 0%) | 0    ( 0%) | 0    ( 0%)  | 0    ( 0%)  |
 | ---------- | ----------  | ---------- | ---------- | ---------- | ----------  | ----------  |
 | Total      | 720         | 25         | 0          | 0          | 12          | 0           |
 +------------+-------------+------------+------------+------------+-------------+-------------+

The STATUS field explained

The STATUS field can show:

  • Idle - Node is completely free. No jobs running on the node.

  • Running - Some jobs are running on the node but it still has available resources for new jobs.

  • Busy - Node is completely busy. There are no free resources on the node. No new jobs can start on this node.

  • Offline - Node is offline and unavailable due to a system issue.

  • Reserved - Node has been booked by other users and is ONLY available for them.