March 20, 2018
Introduction This is continued from the last article. I’ll be writing about benchmarks for the multiplications of matrices, including for the TensorCore (FP16=>FP32), which was first incorporated in the Volta.
Matrix Multiplication with TensorCore NVIDIA TitanV’s TensorCore (FP16=>FP32) and other FP32 benchmarks. I used the same setup as before: Ubuntu16.04 incorporating Python3.5+390.30, CUDA9.0, cuDNN7, and TensorFlow1.6.
https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/ *It’s not really ideal to make comparisons against FP32 where they should be made against FP16, and give exaggerated descriptions the way they do on NVIDIA’s official site, but please note that this graph does make comparisons between FP32 and TensorCore (FP16).