A massively parallel infrastructure for adaptive multiscale simulations: Modeling RAS initiation pathway for cancer

Francesco Di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv Sundram, Thomas R.W. Scogland, Gautham Dharuman, Michael P. Surh, Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos Costa, Changhoan Kim, Bruce D'Amora, Sandrasegaram Gnanakaran, Dwight V. NissleyFred Streitz, Felice C. Lightstone, Peer Timo Bremer, James N. Glosli, Helgi I. Ingólfsson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Computational models can define the functional dynamics of complex systems in exceptional detail. However, many modeling studies face seemingly incommensurate requirements: to gain meaningful insights into some phenomena requires models with high resolution (microscopic) detail that must nevertheless evolve over large (macroscopic) length- and time-scales. Multiscale modeling has become increasingly important to bridge this gap. Executing complex multiscale models on current petascale computers with high levels of parallelism and heterogeneous architectures is challenging. Many distinct types of resources need to be simultaneously managed, such as GPUs and CPUs, memory size and latencies, communication bottlenecks, and filesystem bandwidth. In addition, robustness to failure of compute nodes, network, and filesystems is critical. We introduce a first-of-its-kind, massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), which couples a macro scale model spanning micrometer length- and millisecond time-scales with a micro scale model employing high-fidelity molecular dynamics (MD) simulations. MuMMI is a cohesive and transferable infrastructure designed for scalability and efficient execution on heterogeneous resources. A central workflow manager simultaneously allocates GPUs and CPUs while robustly handling failures in compute nodes, communication networks, and filesystems. A hierarchical scheduler controls GPU-accelerated MD simulations and in situ analysis. We present the various MuMMI components, including the macro model, GPU-accelerated MD, in situ analysis of MD data, machine learning selection module, a highly scalable hierarchical scheduler, and detail the central workflow manager that ties these modules together. In addition, we present performance data from our runs on Sierra, in which we validated MuMMI by investigating an experimentally intractable biological system: the dynamic interaction between RAS proteins and a plasma membrane. We used up to 4000 nodes of the Sierra supercomputer, concurrently utilizing over 16,000 GPUs and 176,000 CPU cores, and running up to 36,000 different tasks. This multiscale simulation includes about 120,000 MD simulations aggregating over 200 milliseconds, which is orders of magnitude greater than comparable studies.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2019
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450362290
DOIs
StatePublished - Nov 17 2019
Externally publishedYes
Event2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 - Denver, United States
Duration: Nov 17 2019Nov 22 2019

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
CountryUnited States
CityDenver
Period11/17/1911/22/19

    Fingerprint

Keywords

  • Adaptive simulations
  • Cancer research
  • Heterogenous architecture
  • Machine learning
  • Massively parallel
  • Multiscale simulations

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Di Natale, F., Bhatia, H., Carpenter, T. S., Neale, C., Schumacher, S. K., Oppelstrup, T., Stanton, L., Zhang, X., Sundram, S., Scogland, T. R. W., Dharuman, G., Surh, M. P., Yang, Y., Misale, C., Schneidenbach, L., Costa, C., Kim, C., D'Amora, B., Gnanakaran, S., ... Ingólfsson, H. I. (2019). A massively parallel infrastructure for adaptive multiscale simulations: Modeling RAS initiation pathway for cancer. In Proceedings of SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis [a57] (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). IEEE Computer Society. https://doi.org/10.1145/3295500.3356197