Post

Comparing benchmarks between ROCm and NVIDIA(Inference)

Introduction As the ROCm libraries has been updated, thoroughput of both inference and training has been improved. In this post we would like to show some comparison between the AMD GPGPU environment on ROCm2.6 and NVIDIA RTX2080ti on Cuda 10 for Machine learning inference tasks. Hardwares and softwares Here are the Hardwares and softwares we have tested these benchmarks. Hardwares and softwares of ROCm environment is as follows: OS:Ubuntu16.04 GPU:AMD RadeonVII or AMD RX Vega 64 ROCm:2.

Post

CenterNet on AMD RadeonGPU

Introduction Because there was a better ObjectDetection paper than M2Det, I checked the operation on Radeon GPU. M2Det is also Chinese, and CenterNet is a model called CenterNet written by Chinese people. According to the paper, it will be the most accurate and lightest model, YoloV3 <M2Det <CenterNet. CenterNet: Keypoint Triplets for Object Detection https://arxiv.org/abs/1904.08189 PyTorch Implementation https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md Keras Implementation https://github.com/see--/keras-centernet Installation Check Clone it and put in the required package.

Post

BERT on AMD RadeonGPU

Introduction It is often asked by natural language processors on AMD Radeon GPU whether BERT works, so I will introduce the story when I verified it. Results Confirming Installation Official Documentation: https://github.com/google-research/bert It seems that it should be possible to move from TensorFlow 1.11 to 1.12.0. As ROC m2.3 update has come, I will set up on an experimental machine and run it on Radeon VII. Originally, on the TensorFlow 1.

Post

How to setup Caffe on AMD Radeon GPU

Introduction Currently, PyTorch / Caffe2 base is at the cutting edge, but some of the sources on github are still based on old Caffe versions. We will look at how to set up hipCaffe (ROCm-Caffe) based on Ubuntu 16.04 + ROCm Write Installation Requirements -Ubuntu 16.04 -Complete set of basic packages required for build -ROCm driver -MIOpen library (CUDA simulation layer) -OpenCV2 or OpenCV3 -hipCaffe Here is a script that compiles with one command line.

Post

PyTorch-ROCm on AMD Radeon GPU

Introduction PyTorch supports ROC m2.1! This is an installation guide for running on AMD Radeon GPUs. Installation AMDGPUdriver 2.1 supports PyTorch1.x.x It was announced that PyTorch was officially supported. https://rocm.github.io/dl.html Deep Learning on ROCm TensorFlow: TensorFlow for ROCm – latest supported version 1.13 MIOpen: Open-source deep learning library for AMD GPUs – latest supported version 1.7.1 PyTorch: PyTorch for ROCm – latest supported version 1.0 Installation Issues（2019/03/01) The official page only describes Docker-based installation methods, and there is no documentation for installing from scratch.

Post

Improved way to install tensorflow-rocm

Introduction Summary of how TensorFlow works with AMDGPU Radeon Installation Dependencies sudo apt update sudo apt -y install software-properties-common curl wget # for add-apt-repository Install Python 3.5.2 Python3.6/Python3.7 can be unstable with AMD GPUs so we will be using Python 3.5.2 For Ubuntu 18 since the default version is Python3.6 executing the following script will configure 3.5.2 PYTHON35=false if [[ `python3 --version` == *"3.5"* ]] ; then echo 'python3.

Post

A verification of "Fast StyleTransfer" using TensorFlow 1.3 on ROCm with AMD Radeon Vega 56

Introduction SourceStyle Transfer This time, I am going to run the “Style transfer” which is popular in the field of image generation and image style transfer, using Tensorflow 1.3 on ROCm with AMD Radeon Vega56. System requirements AMD(TF1.3): Ubuntu 16.04.4 x64 TensorFlow 1.3 Python 3.5 Driver: ROCm 1.7.137 I used the following source code of Fast StyleTransfer when performing. https://github.com/lengstrom/fast-style-transfer.git Thank you,Logan Engstrom. Setup TensorFlow on Radeon GPU HIP-TensorFLow 1.

Post

Benchmark CIFAR10 on TensorFlow with ROCm on AMD GPUs vs CUDA9 and cuDNN7 on NVIDIA GPUs

Introduction I’m going to continue my description of the CIFAR10 benchmark, from where I left off. Related articles Mar 7, 2018 Benchmarks on MATRIX MULTIPLICATION | A comparison between AMD Vega and NVIDIA GeForce series Mar 20, 2018 Benchmarks on MATRIX MULTIPLICATION | TitanV TensorCore (FP16=>FP32) CIFAR10 Average examples pre second Introduction I took the CIFAR10 dataset, which is widely used throughout the world in competitions and benchmarks, and used the public release of TensorFlow in order to measure its training speed.

Post

Semantic Segmentation on an AMD RADEON GPU with Tensorflow1.3

Introduction SourceYoloV2(Object Detection)FCN(Semantic Segmentation) The field of semantic segmentation has many popular networks, including U-Net (2015), FCN (2015), PSPNet (2017), and others. In this study, we used an AMD Radeon GPU to run these networks. We used ROCm-TensorFlow 1.3 and ROCm 1.7.137 as our operating framework. *We re-used the source code from the following repository. hellochick https://github.com/hellochick/semantic-segmentation-tensorflow Setup TensorFlow 1.3 on an AMD Radeon GPU HIP-TensorFLow 1.0.1 was recently updated to TensorFlow 1.

Post

Benchmarks on MATRIX MULTIPLICATION | TitanV TensorCore (FP16=>FP32)

Introduction This is continued from the last article. I’ll be writing about benchmarks for the multiplications of matrices, including for the TensorCore (FP16=>FP32), which was first incorporated in the Volta. Matrix Multiplication with TensorCore NVIDIA TitanV’s TensorCore (FP16=>FP32) and other FP32 benchmarks. I used the same setup as before: Ubuntu16.04 incorporating Python3.5+390.30, CUDA9.0, cuDNN7, and TensorFlow1.6. https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/ *It’s not really ideal to make comparisons against FP32 where they should be made against FP16, and give exaggerated descriptions the way they do on NVIDIA’s official site, but please note that this graph does make comparisons between FP32 and TensorCore (FP16).

Post

VGG-19 on Keras/PlaidML backend

PLAIDML, which is rumored to be faster than HIP-TENSORFLOW Introduction Hello! HIP-TensorFlow is a library implemented by performing an CUDA simulation of TensorFlow, but since its execution speed is still under development or based on the old TensorFlow, there is a speed difference when compared against the latest NVIDIA + TensorFlow in the DeepLearning. Also, since it works at the same speed for RX 580 as for superior GPUs like Vega 56 and Vega 64, it is still an immature library in that it cannot demonstrate the potential of the Vega series.

Post

Benchmarks on MATRIX MULTIPLICATION | A comparison between AMD Vega and NVIDIA GeForce series

Introduction ACUBE Corp. graciously allowed us to borrow a Radeon Pro WX9100, so we have decided to make a report on the card and a record of the results here on our company blog. We would like to extend our heartfelt gratitude to ACUBE Corp. for this opportunity. This report focuses on the Radeon Pro WX9100 card and makes comparisons with the Radeon RX560/580 and RadeonVega56/64/Frontier Edition from the same manufacturer, as well as with the GeForce series from NVIDIA.

Post

Wednesday, March 7, 2018, The World's First AMD GPU-based Cloud Instances for Deep Learning

IRVINE, CALIF. (PRWEB) MARCH 06, 2018 California startup Pegara, Inc. launched the world’s first set of deep learning instances based on GPUs from American chip maker AMD through its “GPU EATER” heterogeneous cloud computing service. Although much of today’s deep learning research is conducted using GPUs (graphics processing units) from NVIDIA, in conjunction with libraries it provides, such as CUDA and cuDNN, the revision of the company’s EULA (End User License Agreement) content for its consumer graphics drivers around December 2017 has raised strong voices of concern among researchers and developers at domestic and overseas universities and enterprises about potential termination of research projects and delays in their practical application.