Achille Peternier, Danilo Ansaloni, Daniele Bonetta, Cesare Pautasso, Walter Binder
18th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, pp. 400-407
Modern processor architectures are increasingly complex and heterogeneous, often requiring solutions tailored to the specific characteristics of each processor model. In this paper we address this problem by targeting the AMD Bulldozer processor as case study for specific hardware-oriented performance optimizations. The Bulldozer architecture features an asymmetric simultaneous multithreading implementation with shared floating point units (FPUs) and per-core arithmetic logic units (ALUs). Bulld Over, presented in this paper, improves thread scheduling by exploiting this hardware characteristic to increase performance of floating point-intensive workloads on Linux-based operating systems. Bulld Over is a user-space monitoring tool that automatically identifies FPU-intensive threads and schedules them in a more efficient way without requiring any patches or modifications at the kernel level. Our measurements using standard benchmark suites show that speedups of up to 10 can be achieved by simply allowing Bulld Over to monitor applications, without any modification of the workload.
PDF: ▼overseer-icpads2012.pdf (462KB)