In December 2023, we embarked upon an ambitious initiative to develop a comprehensive digital twin of the Frontier supercomputer. This twin includes: 3D asset modeling with virtual and augmented reality capabilities, telemetry data assimilation, AI/ML integration, simulations, and reinforcement learning for optimization. The goal was initially to develop four main modules:

  1. A transient simulation of the thermo-fluid cooling system from cooling tower to cold plate.
  2. A resource allocator and power simulator - which models workloads and resulting dynamic power, along with energy conversion losses.
  3. A visual analytics module consisting of both an augmented reality model and a web-based dashboard for launching experiments.
  4. A network digital twin to study dynamic network power and congestion.

Once we were able to model Frontier, we set out to generalize these modules as a generalized framework called ExaDigiT for modeling a variety of supercomputer architectures. This digital twin framework offers insights into operational strategies, “what-if” scenarios, as well as elucidates complex, cross-disciplinary transient behaviors. It also serves as a design tool for future system prototyping. Built on an open software stack with an aim to foster community-driven development, we have formed a partnership with supercomputer centers around the world to develop an open framework for modeling supercomputers. The source code and documentation is available here:

ExaDigiT Source Repositories

ExaDigiT Documentation

For more information, contact Wes Brewer at brewerwh@ornl.gov.

Watch a 2-minute demo of ExaDigiT in action

Meetings / Events

Ongoing

  • Monthly Large Group Meeting: fourth Wednesday of each month, 11am ET.
    (Check exadigit.slack.com for schedule and invite.)

2025

2024

  • High Performance Data Center Digital Twins: ExaDigiT Community BoF - SC24, Atlanta, GA (November 20, 2024)
  • High Performance Data-centre Digital Twins: Birds of a Feather - CUG-2024, Perth, WA, Australia (May 6th, 2024)

2023

  • Initial Invitational Meeting: SC'23, Denver, CO, USA (Nov 15th, 2023)

Working Groups

  • Application Fingerprinting - Terry Jones (ORNL)
  • AI/ML/RL - Soumyendu Sarkar (HPE)
  • Visual Analytics - Marketa Faltynkova (IT4I)
  • Networking - Puneet Sharma (HPE) & John Holmen (ORNL)
  • Power & Cooling - Adrian Jackson (EPCC)
  • Operational Data Analytics - Woong Shin (ORNL) & Jeff Hanson (HPE)
  • Use Cases & Architectures - Tim Dykes (HPE)
  • Documentation - Gabriel Hautreux (CINES)

Other special interest groups:

  • Scheduling - Matthias Maiterth (ORNL)
  • DT for Operations - Matthias Maiterth (ORNL)
  • Reliability - Pavana Prakash (HPE)
  • Omniverse Integration - John Stone (NVIDIA)

Publications

2025

2024

Participating Organizations