Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers

Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D. Carothers, Abhinav Bhatele, Kwan-Liu Ma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order to meet the challenges of large-scale, data-intensive scientific computing. Both of these technologies have been studied in detail independently, but the interaction between them is not well understood. I/O traffic and communication traffic from concurrently scheduled applications may interfere with each other in unexpected ways, and this behavior may vary considerably depending on resource allocation, scheduling, and routing policies.In this work, we analyze I/O and network traffic interference on burst-buffer-equipped dragonfly-based systems using the high-resolution packet-level simulations provided by the CODES storage and interconnect simulation framework. The analysis is performed using realistic I/O workload sizes, a variety of resource allocation and network routing strategies employed in production environments, and a dragonfly network configuration modeled after current vendor options. We analyze the impact of interference on both I/O and communication traffic.We observe that although average network packet latency is stable across a wide variety of configurations, the maximum network packet latency in the presence of concurrent I/O traffic is highly sensitive to subtle policy changes. Our simulations reveal a worst-case single packet latency of 4,700 times the average latency for sub-optimal configurations. While a topology-Aware mapping of compute nodes to burst buffer storage nodes can minimize the variation in maximum packet latency, it can slow down the I/O traffic by creating contention on the burst buffer nodes. Overall, balancing I/O and network performance requires careful selection of routing, data placement, and job placement policies.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages204-215
Number of pages12
Volume2017-September
ISBN (Electronic)9781538623268
DOIs
StatePublished - Sep 22 2017
Event2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 - Honolulu, United States
Duration: Sep 5 2017Sep 8 2017

Other

Other2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
CountryUnited States
CityHonolulu
Period9/5/179/8/17

Keywords

  • Burst buffer
  • Checkpoint
  • Discrete-event simulation
  • Dragonfly networks
  • I/O and communication traffic

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint Dive into the research topics of 'Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers'. Together they form a unique fingerprint.

Cite this