In December 2022, we embarked upon an ambitious initiative to develop a comprehensive digital twin of the Frontier supercomputer. This twin includes: 3D asset modeling with virtual and augmented reality capabilities, telemetry data assimilation, AI/ML integration, simulations, and reinforcement learning for optimization. The goal was initially to develop four main modules:

  1. A transient simulation of the thermo-fluid cooling system from cooling tower to cold plate.
  2. A resource allocator and power simulator - which models workloads and resulting dynamic power, along with energy conversion losses.
  3. A visual analytics module consisting of both an augmented reality model and a web-based dashboard for launching experiments.
  4. A network digital twin to study dynamic network power and congestion.

Once we were able to model Frontier, we set out to generalize these modules as a generalized framework called ExaDigiT for modeling a variety of supercomputer architectures. This digital twin framework offers insights into operational strategies, “what-if” scenarios, as well as elucidates complex, cross-disciplinary transient behaviors. It also serves as a design tool for future system prototyping. Built on an open software stack with an aim to foster community-driven development, we have formed a partnership with supercomputer centers around the world to develop an open framework for modeling supercomputers. The source code and documentation is available here:

ExaDigiT Source Repositories

ExaDigiT Documentation

For more information, contact Wes Brewer at brewerwh@ornl.gov.

Watch a 2-minute demo of ExaDigiT in action

Meetings / Events

Ongoing

  • Monthly Large Group Meeting: fourth Wednesday of each month, 11am ET.
    (Check exadigit.slack.com for schedule and invite.)

2025

2024

  • High Performance Data Center Digital Twins: ExaDigiT Community BoF - SC24, Atlanta, GA (November 20, 2024)
  • High Performance Data-centre Digital Twins: Birds of a Feather - CUG-2024, Perth, WA, Australia (May 6th, 2024)

2023

  • Initial Invitational Meeting: SC'23, Denver, CO, USA (Nov 15th, 2023)

Working Groups

  • Application Fingerprinting - Terry Jones (ORNL)
  • AI/ML/RL - Soumyendu Sarkar (HPE)
  • Visual Analytics - Marketa Faltynkova (IT4I)
  • Networking - Puneet Sharma (HPE) & John Holmen (ORNL)
  • Power & Cooling - Adrian Jackson (EPCC)
  • Operational Data Analytics - Woong Shin (ORNL) & Jeff Hanson (HPE)
  • Use Cases & Architectures - Tim Dykes (HPE)
  • Documentation - Gabriel Hautreux (CINES)

Other special interest groups:

  • Scheduling - Matthias Maiterth (ORNL)
  • DT for Operations - Matthias Maiterth (ORNL)
  • Reliability - Pavana Prakash (HPE)
  • Omniverse Integration - John Stone (NVIDIA)

Publications

2025

2024

Participating Organizations

Participating Organizations CSC Finland IT4Innovations National Supercomputing Center Pawsey Supercomputing Centre San Diego Supercomputing Center Colorado State University Texas State University University of Illinois Chicago Jülich Supercomputing Centre Centre Informatique National de l’Enseignement Supérieur (CINES) University of Basel University of Bologna Edinburgh Parallel Computing Centre Georgia Tech Los Alamos National Laboratory University of North Dakota Lawrence Livermore National Laboratory Cornell Tech Lawrence Berkeley National Laboratory National Renewable Energy Laboratory (NREL) Oak Ridge National Laboratory (ORNL) Argonne National Laboratory (ANL) CERN openlab University of Basel HPC Group University of Trento KTH Royal Institute of Technology Department of Defense High Performance Computing Modernization Program (HPCMP) Hewlett Packard Enterprise NVIDIA Meta Datacenters