About MonARCH¶
MonARCH is pioneering and building high performance computing upon Monash’s specialist Research Cloud fabric. MonARCH has been supplied by Dell with a Mellanox low latency network and NVIDIA GPUs.
System configuration¶
The MonARCH cluster serves the university’s HPC users as its primary community, and remains distinct and independent from MASSIVE M3. However, it is closely aligned with M3. Specifically, MonARCH features:
two dedicated login nodes and a dedicated data transfer node (like on MASSIVE M3);
over 60 servers, totalling to over 1600 CPU cores;
15 GPU nodes, with a mix of nVIDIA Tesla P100 (http://www.nvidia.com/object/tesla-p100.html) cards and K80 (https://www.nvidia.com/en-gb/data-center/tesla-k80/) cards;
a SLURM scheduler with service redundancy, with better stability and new features to improve fair share;
a website for MonARCH HPC user documentation; and
a convergence to a single HPC software module environment, shared with MASSIVE M3.
Hardware
Name |
CPU |
Number of cores / Server |
Usable Memory /Server |
Notes |
---|---|---|---|---|
mi* |
Xeon-Gold 6150 @ 2.70GHz |
36 |
158893MB |
|
hi* |
Xeon-Gold 6150 @ 2.70GHz |
27 |
131000MB |
Same hardware as mi* nodes, but with less cores/memory in the VM |
ga* |
Xeon-Gold-6330 @ 2.00GHz |
56 |
1011964MB |
Each server has two A100 GPU devices |
hm00 |
Xeon-Gold-6150 @ 2.70GHz |
26 |
1419500MB |
Specialist High Memory ~1.4TB machine. Please contact support to get access. |
md* |
Xeon(R) Gold 5220R @ 2.20GHz |
48 |
735000MB |
The most recent Monarch Nodes which are baremetal. |
mk* |
Xeon-Platinum-8260 @ 2.50GHz |
48 |
342000MB |
|
ms* |
Xeon-Gold-6338 @ 2.00GHz |
64 |
505700MB |
The most recent Monarch Nodes |
Login Information¶
MonARCH has two interactive login nodes and one dedicated for data transfers. The hostnames for these are:
Hostname |
Purpose |
---|---|
monarch.erc.monash.edu |
This alias will take you to one of the two login nodes below. |
monarch-login1.erc.monash.edu |
The first login node of MonARCH. |
monarch-login2.erc.monash.edu |
The second login node of MonARCH. |
monarch-dtn.erc.monash.edu |
A dedicated data transfer node ideal for large file transfers and |
MonARCH vs M3¶
MonARCH and M3 share the same user identity system. However users on one cluster can not log into the other unless they belong to an active project on that cluster.
Hyperthreading¶
All nodes on MonARCH V2 have hyperthreading turned off for performance reasons.
Software Stack¶
MonARCH V2 uses the M3 software stack (on /usr/local). Software packages are enabled using environment modules (i.e. the module command). This is explained in https://docs.monarch.erc.monash.edu.au/MonARCH/software/software.html .
SLURM Partitions¶
MonARCH V2’s SLURM scheduler currently uses a simple partition (queue) structure:
comp for CPU-only jobs of up to seven days long;
gpu for GPU jobs of up to seven days long;
short for 24-hour jobs;
himem for the high memory node only. Please contact support to get access to this partition.
MonARCH uses SLURM’s QOS (Quality of Service) feature to control access to different features of the cluster. All users belong to a default QOS called normal. Users may be directed to use a different QOS at times (i.e. to use a Partner Share).
How to examine the QOS:
sacctmgr show qos normal format="Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20"
Name MaxWall MaxCPUsPU MaxTRESPU
normal 7-00:00:00 64 cpu=64,gres/gpu=3