Evaluating Shared-Cache Performance with Microbenchmarks and Reuse Distance Analysis
Abstract
Emergence of multicore architectures has opened up new opportunities for thread-level parallelism and dramatically increased the theoretical peak on current systems. However, achieving a high fraction of peak performance requires careful orchestration of many architecture-sensitive parameters. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to both parallelism and data locality.
This research evaluates the shared-cache performance of several scientic kernels. A synthetic microbenchmark along with hardware performance counter measurements are used to estimate cache sharing among multiple threads in parallel applications. A novel reuse-distance based algorithm is developed to identify correlations between reused distance patterns and shared-cache utilization.