mNPUsim: Evaluating the Effect of Sharing Resources with Multi-Core NPUs
Published in IISWC, 2023
*Soojin Hwang, *Sunho Lee, Jungwoo Kim, Hongbeen Kim, Jaehyuk Huh, "mNPUsim: Evaluating the Effect of Sharing Resources with Multi-Core NPUs", the 2023 IEEE International Symposium on Workload Characterization ( IISWC ), October 2023 (* co-first authors)
Multi-core neural processing units (NPUs) have emerged to scale the computation capability of NPUs to efficiently support diverse machine learning tasks. In such multi-core NPUs, workloads in different cores can affect other co-runners by incurring contentions on shared resources such as external memory bandwidth and memory management unit (MMU) for address translation. However, many recent NPU studies use a single-core NPU framework without considering dynamic effect by the shared resources. For this study, we develop a new multi-core NPU simulator to assess the effect of resource sharing accurately. Using the simulator, this paper reports the sharing behaviors of multi-core NPUs with respect to overall throughput and performance variance caused by co-runners. The evaluation of the dual and quad core NPUs shows that sharing MMU and memory bandwidth in general is beneficial for throughput, with minor degradation in fairness. The evaluation also shows that page table walkers in MMU are one of the critical shareable resources. Due to the bursty nature of NPU memory accesses, sharing of walker bandwidth across multiple cores can significantly improve the performance. The study extends the evaluation of scalability with multi-core NPUs, investigating the effect of mapping heterogeneous models to multiple NPUs.
Errata: Figure 2(a)