Welcome to the MonARCH documentation!¶
Important
Rollout of a new operating system by 30 April 2024
A major security uplift of MonARCH is currently underway.
The nine-year old CentOS operating system on MonARCH is approaching end-of-life (EOL) and will be out of support by the end of Q2 2024. Over the next few weeks, MonARCH will be progressively upgraded to run Rocky Linux, a newer and more secure operating system.
Our focus is on building new software packages for Rocky Linux. Actively-used applications on /usr/local
are being tested on the new OS and will be reinstalled if they are incompatible. These applications along with future software requests will be built for Rocky Linux and installed at: /apps
This upgrade will be conducted in several phases so that you are able to continue running your analyses on MonARCH throughout. We will progressively upgrade existing CentOS compute nodes to Rocky Linux with the aim of meeting our target date of 30 April 2024.
MonARCH will retain the use of:
SLURM for job scheduling; and
Environment modules for activating applications
Please visit this page for updates.
Upcoming information - stay tuned:
MonARCH Rocky Linux login and data transfer nodes;
How to submit jobs to Rocky compute nodes; and
How to request for new software for Rocky.
Important
New A100 GPU Nodes – September 2023
We are pleased to announce the availability of two A100 GPU nodes. The settings to use are:
#SBATCH --gres=gpu:A100:1
Important
Hardware Refresh Plan 2021-22 – Update: September 2023
Please be advised of the following update on MonARCH hardware refresh. Specifically, four nodes will be decommissioned on the 14th of July 2022. We had originally advised a later date for the shut off. Please see below for the updated schedule:
Compute Nodes |
Capability |
Decommissioned |
---|---|---|
|
Intel Xeon-E5-2680-v3 |
in 2021 |
|
Intel Xeon-E5-2667-v3 |
in 2021 |
|
Intel Xeon-E5-2680-v3 & NVIDIA K80 GPUs |
in 2022 |
|
||
|
Intel Xeon-E5-2680-v4 & NVIDIA P100 GPUs |
September 2023 |
|
Important
Hardware Refresh Plan 2021 – Update: 13 May 2021
Please be advised of the following hardware refresh schedule for 2021. These servers are now coming into end-of-life and will be retired this year.
Compute Nodes |
Capability |
To be retired by the |
---|---|---|
|
Intel Xeon-E5-2680-v3 |
end of May 2021 |
|
Intel Xeon-E5-2667-v3 |
end of May 2021 |
|
Intel Xeon-E5-2680-v3 & NVIDIA K80 GPUs |
** To be confirmed ** |
|
||
|
Intel Xeon-E5-2680-v4 & NVIDIA P100 GPUs |
middle of November 2021 |
While this will result in a reduction of total CPU capacity for 2021, retiring these servers is necessary to make room for new and faster compute nodes, planned for Q3/Q4 2021 and 2022.
We will be enabling the appropriate mechanisms (e.g., SLURM reservation
) to
ensure that these nodes will be idle of running jobs prior to their retirement.
Please check your job scripts to ensure they do not specify these nodes using
--nodelist
.
Important
Scheduled Outages
Planned dates for 2022 Maintenance will be:
To be announced.
See details at: https://docs.monarch.erc.monash.edu/scheduled-maintenance.html
MonARCH (Monash Advanced Research Computing Hybrid) is the next-generation HPC/HTC Cluster, designed from the ground up to address the emergent and future needs of the Monash HPC community.
A key feature of MonARCH is that it is provisioned through R@CMon, the Research Cloud @ Monash facility. Through the use of advanced cloud technology, MonARCH is able to configure and grow dynamically. As with any HPC cluster, MonARCH presents a single point-of-access to computational researchers to run calculations on its constituent servers.
MonARCH aims to continually develop over time. Currently, it consists of the following servers
mi* nodes are 36 core Xeon-Gold-6150 @ 2.70GHz servers wtih 158893MB usable memory
hc* nodes are 24 core Xeon-E5-2680-v3 @ 2.50GHz servers with 100550MB usable memory
hs* nodes are 16 core Xeon-E5-2667-v3 @ 3.20GHz servers with 100550MB usable memory
gp* nodes are 28 core Xeon-E5-2680-v4 @ 2.40GHz servers with 241660MB usable memory. Each server has two P100 GPU cards.
mk* nodes are 48 core Xeon-Platinum-8260 @ 2.4GHz servers with 342000M usable memory.
ge* baremetal nodes are 24 core Xeon-E5-2680-v3 @ 3.3GHZ servers with 257669M usable memory. Each server has eight K80 GPU processors (four boards with 2 K80 chips each).
gf* nodes are are 24 core Xeon-E5-2680-v3 @ 2.5GHz servers with 235980M usable memory. Each server has four K80 GPU processors (two boards with two K80 chips each).
hm00. This single node is 36 core Xeon-Gold-6150 @ 2.7GHz server with 1.4TB usable memory.
For data storage, we have deployed a parallel file system service using Intel Enterprise Lustre; providing over 300 TB usable storage with room for future expansion.
The MonARCH service is operated by the Monash HPC team and continuing technical and operational support from the Monash Cloud team, and eSolutions Servers-and-Storage, and Networks teams.
If you have found the MonARCH useful for your research, we will be very grateful if you kindly acknowledge us with a text along the lines of:
This research was supported in part by the Monash eResearch Centre and eSolutions-Research Support Services through the use of the MonARCH HPC Cluster.
- About MonARCH
- Requesting an account
- Requesting help on MonARCH
- Connecting to MonARCH
- File Systems on MonARCH
- Copying files to and from MonARCH
- Software on MonARCH
- Running jobs on MonARCH
- Default Values For Selecting Hardware
- Selecting a particular server
- Checking the status of MonARCH
- Quick Start
- Running Simple Batch Jobs
- Running MPI Jobs
- Running Multi-threading Jobs
- Running Interactive Jobs
- Running GPU Jobs
- QoS (Quality of Service)
- How to run jobs with QoS
- Checking job status
- squeue Status Codes and Reasons
- Project Allocation
- More on SLURM
- Lustre on MonARCH