Skip to content

AutoBench: A Holistic Platform for Automated and Reproducible Benchmarking in HPC Testbeds

DOI: https://dl.acm.org/doi/10.1145/3680256.3721332

Abstract

Benchmarking is indispensable for evaluating HPC systems and architectures, providing critical insights into their performance, efficiency, and operational characteristics. However, the increasing heterogeneity and complexity of modern HPC architectures 1 present significant challenges for benchmarking to achieve consistent and comprehensive insights.

Likewise, commercial HPC environments encounter similar challenges due to their dynamic and diverse nature. Therefore, it is crucial to have automatic benchmarking of platforms, which consider holistic configuration options across various layers including the operating system layer, the software stack layer, among others.

This paper presents AutoBench, an automated benchmarking platform designed to target benchmarking on testbed systems at HPC and Cloud data centers to address the above challenges. %and is adaptable to cloud environments. With its multi-layered, customizable configuration options, AutoBench assists benchmarking across diverse systems. In addition, AutoBench enables automation, exploration of optimal configurations in multiple layers, and reproducibility.

We demonstrate how we use this benchmarking tool in the BEAST system 2 at Leibniz Supercomputing Centre (LRZ) to provide comparisons between various architectures and their benefits. We also demonstrate that AutoBench can reproduce benchmarks with an acceptable variance of ~5%.

Sample Layered Cluster Configuration

The figure 1 shows a sample cluster configuration folder structure for beast, including the ice partition, nodes, and scheduler settings detailing shell preference, job script template, and submission command.

config/
└── cluster/
├── beast/
├── beast.yaml
├── 1_systems
└── ice.yaml
├── 2_oss
└── ice.yaml
├── 3_softwares
└── ice.yaml
└── 4_benchmarks
└── ice.yaml
└── scheduler
├── config.yaml
├── slurm_job_template.txt
└── flux_job_template.txt

Figure 1: Hierarchical Directory Structure for the BEAST Cluster Featuring the ice Partition and Associated Scheduler Configuration Templates.

User Workflow

Figure. 2 presents the sequence of user workflow steps along with their respective commands.

Alt text for the image |50

Figure 2: Displays the user workflow commands used to create job scripts and submit them to the scheduler 34

Concrete Generated Benchmark

Below is an instance of a HPL benchmark created for the ice1 node, running SUSE OS, configured at 2.4GMHz with an icelake CPU, and utilizing gcc as the compiler.

{
"BID": 0,
"cluster": "beast",
"partition": "ice",
"system": {
"target_components": "icelake_cpu"
},
"os": {
"os": "suse",
"freqs": "2.40GHz"
},
"software": {
"model": "hpl_mpi",
"compiler": "mpicc",
"hpl_version": "2.3-gcc-13.2.0-726jfer",
"model_type": "omp",
"mount_path": "/home/sw/ice/spack/share/spack/modules",
"teams": 1
},
"benchmark": {
"benchmark": "hpl",
"np": 72,
"nb": 192,
"configpath": "../configs/cluster/beast/4_benchmarks/hpl_ice"
}
}

Multistage Software Stack Framework

The code below showcases the software stack configuration, deployed using a multistage software stack framework

clusters:
- name: beast
toolkits:
- zlib
partitions:
- name: ice
build_server: ice1
compilers:
- gcc@13.2.0
- llvm@17.0.4 +flang %gcc@13.2.0
benchmarks:
- osu-micro-benchmarks@7.3
- ior@3.3.0
- mdtest@1.9.3
- hpcc@1.5.0 ^netlib-lapack
- hpcg@3.1
- babelstream@4.0 +stddata
- stream@5.10
- lulesh@2.0.3
- caliper
- hpl@2.3 ^openblas

Deployment of AutoBench Infrastructure

To deploy AutoBench infrastructure on a cluster, we presume that similar nodes (e.g., with similar architectures or hardware components) are grouped into partitions, each running the same OS. The essential software stack for each partition is then built using the Spack, followed by the installation of the SLURM scheduler. Additionally, the necessary repositories are established and configured with appropriate access levels. CI runners and DCDB are configured for monitoring on frontend/login nodes, and DVFS and similar tools are deployed on compute nodes with elevated user permissions, completing the setup.

Reproducibility workflow

Alt text for the image |50

References

Footnotes

  1. Schulz, Martin and Kranzlm”{u}ller, Dieter and Schulz, Laura Brandon and Trinitis, Carsten and Weidendorfer, Josef, On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations, Association for Computing Machinery,2021 DOI.

  2. Raoofy, Amir and Elis, Bengisu and Bode, Vincent and Chung, Minh Thanh and Breiter, Sergej and Schlemon, Maron and Herr, Dennis-Florian and Fuerlinger, Karl and Schulz, Martin and Weidendorfer, Josef, BEAST Lab: A Practical Course on Experimental Evaluation of Diverse Modern HPC Architectures and Accelerators, Journal of Computational Science,2024.

  3. Slurm Workload Manager - Documentation, (Accessed on 06/13/2024), Link.

  4. Patki, Tapasya and Ahn, Dong and Milroy, Daniel and Yeom, Jae-Seung and Garlick, Jim and Grondona, Mark and Herbein, Stephen and Scogland, Thomas, Fluxion: A Scalable Graph-Based Resource Model for HPC Scheduling Challenges, Association for Computing Machinery,2023 DOI.