Benchmark CIFAR10 on TensorFlow with ROCm on AMD GPUs vs CUDA9 and cuDNN7 on NVIDIA GPUs
I’m going to continue my description of the CIFAR10 benchmark, from where I left off.
Average examples pre second
I took the CIFAR10 dataset, which is widely used throughout the world in competitions and benchmarks, and used the public release of TensorFlow in order to measure its training speed.
In this article I’ll only be writing about CIFAR10.
I used the following source code when performing the benchmarks.
Ubuntu 16.04.4 x64
Driver: 390.30, CUDA9.0, cuDNN7
The results were close to those of the previous benchmark on matrix operations. I got strange results when using HIP–TensorFlow 1.0.1, with the first–generation RX580 winning out over the Vega64, so it was apparent that there was some sort of issue surrounding the AMD chip, but using the new ROCm and a higher version of Tensorflow 1.3, the speed issues nearly completely vanished.
I found it interesting that the overclocked Sapphire Nitro+ Vega64 was slower than the Frontier Edition, but perhaps there are some differences in the way that they manage memory and the like. Regardless, the fact that I was able to record speeds at approximately the theoretical limit made the results of this benchmark quite fascinating.
I hope to perform tests on PlaidML with a new version of ROCm for another article in the near future.
- HIP-TensorFlow https://github.com/ROCmSoftwarePlatform/hiptensorflow
- ROCm https://github.com/RadeonOpenCompute/ROCm
- MIOpen https://gpuopen.com/compute-product/miopen/
- ROCm-TensorFlow https://github.com/ROCmSoftwarePlatform/tensorflow
- vertex.ai Official Top http://vertex.ai/
- vertex.ai PlaidML http://vertex.ai/blog/announcing-plaidml
- PlaidML Github https://github.com/plaidml/plaidml
- CIFAR10 on the TensorFlow https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10