Low-Overhead Tracing of Large-Scale Parallel Programs
MetadataShow full metadata
Some parallelization bugs only manifest themselves when a program is executed at scale. Such bugs are notoriously difficult to find, and tracing parallel programs at at scale tends to be very expensive both in terms of execution overhead and in terms of the amount of trace data generated. To make light-weight debugging possible on large-scale systems, I present and evaluate a scalable profiling tool called RTC-Tracer that incrementally compresses the gathered information before it is written to memory or disk. For example, RTC-Tracer can track every function call and return of the Mantevo miniapps running on Stampede with a 1.73 to 2.31x overhead in execution time on average while compressing the collected information by a factor of 100, resulting in only a few kilobytes per second of trace data being emitted by each processor.