Low-Overhead Tracing of Large-Scale Parallel Programs

Date

2016-05

Authors

Devale, Sindhu

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Some parallelization bugs only manifest themselves when a program is executed at scale. Such bugs are notoriously difficult to find, and tracing parallel programs at at scale tends to be very expensive both in terms of execution overhead and in terms of the amount of trace data generated. To make light-weight debugging possible on large-scale systems, I present and evaluate a scalable profiling tool called RTC-Tracer that incrementally compresses the gathered information before it is written to memory or disk. For example, RTC-Tracer can track every function call and return of the Mantevo miniapps running on Stampede with a 1.73 to 2.31x overhead in execution time on average while compressing the collected information by a factor of 100, resulting in only a few kilobytes per second of trace data being emitted by each processor.

Description

Keywords

Tracing, Large-scale Parallel Programs

Citation

Devale, S. (2016). <i>Low-overhead tracing of large-scale parallel programs</i> (Unpublished thesis). Texas State University, San Marcos, Texas.

Rights

Rights Holder

Rights License

Rights URI