MS Presentation: Yi He
Efficient Resilience Technique and Analysis Framework for Deep Learning Accelerators
Deep learning (DL) accelerators are specialized hardware to execute DL
workloads, and are becoming increasingly prominent. Hardware
resilience is essential to DL accelerators because, although DL
workloads exhibit inherent tolerance to errors, it may not be
sufficient for a DL accelerator to achieve the resilience requirements
for a wide range of applications, especially safe-critical
applications such as self-driving cars. In this talk, we first present
an efficient resilience analysis framework, called FIdelity, which
combines (1) systematic hardware analysis, by leveraging unique
structures and design principles of DL accelerators, and (2)
high-level software injection, to model hardware errors in software DL
frameworks with high fidelity and high efficiency. We validate the
accuracy of our FIdelity framework using a case study on Nvidia's open
source accelerator NVDLA, and demonstrate that it achieves up to 1800X
speedup compared to existing error analysis techniques. Next, we
present a new, light-weight retrain based resilience technique,
inspired by the resilience analysis results obtained by using our
Fidelity framework for several representative CNN workloads. This
technique achieves large (8.7X) improvement on resilience, while
introducing 0% system level cost and complexity.
Yi He
Yi's advisor is Prof. Yanjing Li