MELA: A visual analytics tool for studying multifidelity HPC system logs

F. N.U. Shilpika, Bethany Lusch, Murali Emani, Venkatram Vishwanath, Michael E. Papka, Kwan Liu Ma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To maintain a robust and reliable supercomputing hardware system there is a critical need to understand various system events, including failures occurring in the system. Toward this goal, we analyze various system logs such as error logs, job logs and environment logs from Argonne Leadership Computing Facility's (ALCF) Theta Cray XC40 supercomputer. This log data incorporates multiple subsystem and component measurements at various fidelity levels and temporal resolutions-a very diverse and massive dataset. To effectively identify various patterns that characterize system behavior and faults over time, we have developed a visual analytics tool, MELA, to better identify patterns and glean insights from these log data.

Original languageEnglish (US)
Title of host publicationProceedings of DAAC 2019
Subtitle of host publication3rd Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages13-18
Number of pages6
ISBN (Electronic)9781728159911
DOIs
StatePublished - Nov 2019
Externally publishedYes
Event3rd IEEE/ACM Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control, DAAC 2019 - Denver, United States
Duration: Nov 22 2019 → …

Publication series

NameProceedings of DAAC 2019: 3rd Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference3rd IEEE/ACM Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control, DAAC 2019
CountryUnited States
CityDenver
Period11/22/19 → …

Keywords

  • Clustering
  • Error Log Analysis
  • HPC
  • Topics Over Time
  • Visualization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems and Management
  • Control and Optimization

Fingerprint Dive into the research topics of 'MELA: A visual analytics tool for studying multifidelity HPC system logs'. Together they form a unique fingerprint.

  • Cite this

    Shilpika, F. N. U., Lusch, B., Emani, M., Vishwanath, V., Papka, M. E., & Ma, K. L. (2019). MELA: A visual analytics tool for studying multifidelity HPC system logs. In Proceedings of DAAC 2019: 3rd Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 13-18). [8948714] (Proceedings of DAAC 2019: 3rd Industry/University Joint International Workshop on Data-Center Automation, Analytics, and Control - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DAAC49578.2019.00008