Speaker
Description
The development of modern heterogeneous accelerators, such as GPUs, has boosted the prosperity of artificial intelligence (AI). Recent years have seen an increasing popularity of AI for the nuclear physics (AI4NP) domain. While most AI4NP studies focus on feasibility analysis, we target at their performance on the modern GPUs equipped with Tensor Cores.
We first benchmark the throughput of hyper-parameterized end-to-end fully connected (FC) and convolutional neural network (CNN) models. We then examine some AI4NP applications to compare their peak performance to these benchmarks. We study the performance gain and accuracy loss caused by GPU Tensor Core’s low-precision floating point (TF32, FP16 and BF16) operations. We conduct our experiments on NVIDIA’s T4 and A100 GPUs with the PyTorch and Tensorflow Keras frameworks.
This work explores the behaviors of different GPU hardware platforms and AI software tools. It can serve as a guide for the performance optimization on larger scale deployments of AI4NP applications.
Consider for long presentation | No |
---|