PhD Defence: Christian Fobel
PhD candidate Christian Fobel will defend his thesis "Deterministic, Weak-scaling Parallelism for Wire-length and Timing-driven FPGA Placement, Suitable for Multi-core and Many-Core Architectures" on January 9, 2015, at 1:30pm in Reynolds 312.
Title
Deterministic, Weak-scaling Parallelism for Wire-length and Timing-driven FPGA Placement, Suitable for Multi-core and Many-Core Architectures
Abstract
Field-Programmable Gate Arrays (FPGAs) enable rapid-prototyping of digital logic designs in-house, without the significant up-front expense of building custom fabrication facilities. How- ever, as the number of resources on each new generation of FPGAs continues to grow rapidly, enormous pressure is placed on the development of algorithms to reduce hardware compilation times to maintain the competitive advantage of using FPGAs. Approximately half of the compilation time for FPGA designs is spent performing placement. Unfortunately, placement is an NP-hard problem, where the most commonly used heuristics are based on serial simulated annealing, where run-time grows exponentially with the size of the circuit to be placed. There- fore, there is significant incentive to develop parallel placement algorithms that are capable of harnessing the increasingly abundant processing power of modern many-core architectures to improve placement run-times. For most parallel programs, the theoretical ideal is to achieve strong-scaling, where for axed size problem, speed-up increases for each additional parallel worker added. However, in practice, this is rarely achievable. A more realistic goal is to achieve weak-scaling, where so long as the non-parallelizable work of the program scales at a much slower rate than the amount of parallel work as the problem size grows, speed-ups will increase as parallel workers are added towards linear speed-up, as the number of parallel workers tends to infinity. However, none of the existing methods for parallel FPGA placement in literature exhibit weak-scaling, since they all include non-parallelizable work that scales with the size of the problem.
We propose a methodology for parallel FPGA placement based on “best-known" patterns of parallel computation with defined work complexity and span complexity, which lead to weak-scaling. In the context of FPGA placement, this means that our methodology is expected to result in speed-ups increasing towards linear speed-up as the size of net-lists to be placed increases along with the number of available hardware threads to execute parallel workers. Moreover, by composing our approach using parallel patterns, we make it possible to parallelize placement while maintaining determinism. Using parallel patterns, we also propose the rst scalable algorithms for timing analysis, which are ideally suited for modern many-core architectures, such as GPUs.
Our experimental results demonstrate that our proposed wire-length driven placement tool achieves a mean speed-up of 19 on a commodity GPU over a state-of-art academic placer, while improving solution quality by 5%. Our proposed timing driven placer exhibits a mean speed-up of 31 over a state-of-the-art, academic timing driven placer. Moreover, our timing driven placer improves critical-path delay by 20% compared to our proposed wire-length driven method. In addition, our experimental results show increasing speed-ups as the size of the problem grows, validating our complexity analysis predicting weak-scaling. Since both the peak computational throughput of modern multi-core and many-core architectures, and the number of resources available on modern FPGAs are growing rapidly, weak-scaling is an ideal fit for a parallel FPGA placement algorithm to provide sustainable performance for future FPGA designs and many-core architectures.
Advisors: Gary Grewal & Deborah Stacey