POSTS

PyTorch-ROCm on AMD Radeon GPU

March 20, 2019

Introduction

PyTorch supports ROC m2.1!

This is an installation guide for running on AMD Radeon GPUs.

Installation

AMDGPUdriver 2.1 supports PyTorch1.x.x

It was announced that PyTorch was officially supported.

https://rocm.github.io/dl.html

Deep Learning on ROCm

TensorFlow: TensorFlow for ROCm – latest supported version 1.13

MIOpen: Open-source deep learning library for AMD GPUs – latest supported version 1.7.1

PyTorch: PyTorch for ROCm – latest supported version 1.0

Installation Issues（2019/03/01)

The official page only describes Docker-based installation methods, and there is no documentation for installing from scratch.

https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html You will find installation details on the page above, but there is lack of installation instructions from scratch.

python tools/amd_build/build_pytorch_amd.py python tools/amd_build/build_caffe2_amd.py

The part of hippify(converts CUDA code to HIP code) is also different.

Since docker file is defined on the following repository we can can search for the latest installation instructions based on this file.

https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/Dockerfile

We have set up an installer that we have put together using the information on the Dockerfile. You can follow the link to find out how to install AMDGPU ROCm-PyTorch 1.1.0a based on Ubuntu 16.04 + Python 3.5 or Python 3.6.

curl -sL http://install.aieater.com/setup_pytorch_rocm | bash -

You have to rebuild from source for each graphics card type.

gfx806(RX550/560/570/580) gfx900(VegaFrontierEdition/Vega56/Vega64/WX9100/MI25) gfx906(RadeonVII/MI50/MI60)

The script above will have an option for during installation to select a graphics card.

In the snipper below we will examine the contents of the installer.

AMDGPU ROCm-PyTorch1.1.0a installation script

# curl -sL http://install.aieater.com/setup_pytorch_rocm | bash -


apt-get update && apt-get install -y --no-install-recommends curl && \
  curl -sL http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add - && \
  sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' \


apt-get update &&  apt-get install -y --no-install-recommends \
  libelf1 \
  build-essential \
  bzip2 \
  ca-certificates \
  cmake \
  ssh \
  apt-utils \
  pkg-config \
  g++-multilib \
  gdb \
  git \
  less \
  libunwind-dev \
  libfftw3-dev \
  libelf-dev \
  libncurses5-dev \
  libomp-dev \
  libpthread-stubs0-dev \
  make \
  miopen-hip \
  miopengemm \
  python3-dev \
  python3-future \
  python3-yaml \
  python3-pip \
  vim \
  libssl-dev \
  libboost-dev \
  libboost-system-dev \
  libboost-filesystem-dev \
  libopenblas-dev \
  rpm \
  wget \
  net-tools \
  iputils-ping \
  libnuma-dev \
  rocm-dev \
  rocrand \
  rocblas \
  rocfft \
  hipsparse \
  hip-thrust \
  rccl \

curl -sL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
sh -c 'echo deb [arch=amd64] http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main > /etc/apt/sources.list.d/llvm7.list' && \
sh -c 'echo deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main >> /etc/apt/sources.list.d/llvm7.list'\

apt-get update && apt-get install -y --no-install-recommends clang-7

apt-get clean && \
rm -rf /var/lib/apt/lists/*

sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake


prf=`cat <<'EOF'
export HIP_VISIBLE_DEVICES=0
export HCC_HOME=/opt/rocm/hcc
export ROCM_PATH=/opt/rocm
export ROCM_HOME=/opt/rocm
export HIP_PATH=/opt/rocm/hip
export PATH=/usr/local/bin:$HCC_HOME/bin:$HIP_PATH/bin:$ROCM_PATH/bin:/opt/rocm/opencl/bin/x86_64:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/opencl/lib/x86_64
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export HIP_PLATFORM="hcc"
export KMTHINLTO="1"
export CUPY_INSTALL_USE_HIP=1
export MAKEFLAGS=-j8
export __HIP_PLATFORM_HCC__
export HIP_PLATFORM=hcc
export PLATFORM=hcc
export USE_ROCM=1
export MAX_JOBS=2
EOF
`

GFX=gfx900
echo "Select a GPU type."
select INS in RX500Series\(RX550/RX560/RX570/RX580/RX590\) Vega10Series\(Vega56/64/WX9100/FE/MI25\) Vega20Series\(RadeonVII/MI50/MI60\) Default
do
case $INS in
Patch)
PATCH;
break;;
RX500Series\(RX550/RX560/RX570/RX580/RX590\))
GFX=gfx806
break;;
Vega10Series\(Vega56/64/WX9100/FE/MI25\))
GFX=gfx900
break;;
Vega20Series\(RadeonVII/MI50/MI60\))
GFX=gfx906
break;;
Default)
break;;
*) echo "ERROR: Invalid selection"
;;
esac
done
export HCC_AMDGPU_TARGET=$GFX


echo "$prf" >> ~/.profile
source ~/.profile

pip3 install cython pillow h5py numpy scipy requests sklearn matplotlib editdistance pandas portpicker jupyter setuptools pyyaml typing enum34 hypothesis


update-alternatives --install /usr/bin/gcc gcc /usr/bin/clang-7 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/clang++-7 50

# git clone https://github.com/pytorch/pytorch.git
git clone https://github.com/ROCmSoftwarePlatform/pytorch.git pytorch-rocm
cd pytorch-rocm
git checkout e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22 # 1.1.0a0+e6991ed
git submodule init
git submodule update

#python3 tools/amd_build/build_pytorch_amd.py
#python3 tools/amd_build/build_caffe2_amd.py
python3 tools/amd_build/build_amd.py

python3 setup.py install
pip3 install torchvision

cd ~/
clinfo | grep '  Name:'
python3 -c "import torch;print('CUDA(hip) is available',torch.cuda.is_available());print('cuda(hip)_device_num:',torch.cuda.device_count());print('Radeon device:',torch.cuda.get_device_name(torch.cuda.current_device()))"

What should be changed from the script above are the following lines.

#python3 tools/amd_build/build_pytorch_amd.py
#python3 tools/amd_build/build_caffe2_amd.py
python3 tools/amd_build/build_amd.py

The hippify script has been merged into one

Because it is currently under development.

PyTorch Official https://github.com/pytorch/pytorch.git

The latest build does not work.

PyTorch-ROCm https://github.com/ROCmSoftwarePlatform/pytorch.git

git checkout e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22 # 1.1.0a0+e6991ed

the following git commit builds succesfully

e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22

CUDA to HIP

CUPA code can run on AMD-Radeon GPU by compiling CUDA code into HIP code using of hippify.

Therefore, the designation of the operation device is specified and moved as ‘cuda’.

There is also an item with a hip device specification, but specifying it as hip does not work. You must specify ‘cuda’.

Benchmarking

https://github.com/marvis/pytorch-mobilenet

Benchmarking mobilenet on pytorch

AMDGPU RadeonVII + ROCm2.1 + ROCm-PyTorch1.1.0a

use_gpu: True, nb_batches: 1
  resnet18 : 0.005838 (sd 0.000290)
   alexnet : 0.001124 (sd 0.000137)
     vgg16 : 0.001759 (sd 0.000033)
squeezenet : 0.003084 (sd 0.000115)
 mobilenet : 0.007428 (sd 0.000213)
use_gpu: True, nb_batches: 16
  resnet18 : 0.005712 (sd 0.000202)
   alexnet : 0.001107 (sd 0.000019)
     vgg16 : 0.002957 (sd 0.001784)
squeezenet : 0.006802 (sd 0.003843)
 mobilenet : 0.007036 (sd 0.000301)

The following Japanese website provides the code for comparing different models on cpu and gpu

https://qiita.com/yu4u/items/c6e24d862325fac96f61

The results are below.

Ubuntu 16.04, CPU: i7-7700 3.60GHz、GPU: GeForce GTX1080 PyTorch0.1.11

use_gpu: True, nb_batches: 1
  resnet18 : 0.001915 (sd 0.000057)
   alexnet : 0.000691 (sd 0.000005)
     vgg16 : 0.002390 (sd 0.002091)
squeezenet : 0.002086 (sd 0.000104)
 mobilenet : 0.048602 (sd 0.000380)
use_gpu: True, nb_batches: 16
  resnet18 : 0.006055 (sd 0.005111)
   alexnet : 0.000744 (sd 0.000014)
     vgg16 : 0.025156 (sd 0.029848)
squeezenet : 0.012983 (sd 0.000024)
 mobilenet : 0.064022 (sd 0.000411)

use_gpu: False, nb_batches: 1
  resnet18 : 0.218282 (sd 0.002961)
   alexnet : 0.081834 (sd 0.000445)
     vgg16 : 1.484166 (sd 0.001384)
squeezenet : 0.102657 (sd 0.002118)
 mobilenet : 0.141093 (sd 0.005197)
use_gpu: False, nb_batches: 16
  resnet18 : 0.896854 (sd 0.004594)
   alexnet : 0.283497 (sd 0.003010)
     vgg16 : 5.622119 (sd 0.020102)
squeezenet : 0.514910 (sd 0.004134)
 mobilenet : 0.892604 (sd 0.017502)

Although benchmarks on GeForce GTX 1080 have been published, it is not good to compare the environment with a state where PyTorch is less than 1.0, so we plan to re-do benchmarks later. Also, as we have confirmed that the GPU process is zombied with ROCm2.1 / ROCm2.2, it is also worth noting a point that has been slightly destabilized as in ROCm 1.7.

Compile Time Warning

When compiling ROCm-PyTorch from source There are quite a few warnings about unrolling, but the compilation itself goes smoothly. At runtime

warning: <unknown>:0:0: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering

The warning makes it clear that there is a penalty on performance and also we confirmed it has an apparent impact on performance.

Current Impression

As ROCm-PyTorch still has unrolled expansion after transcompilation and there is a performance penalty, depending on the model, optimization may not take place but there is considerable variation. It is necessary to use with caution.

References

TensorFlor-ROCm / HipCaffe / PyTorch-ROCm / Caffe2 installation https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html
TensorFlow-ROCm https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
PyTorch-ROCm https://github.com/ROCmSoftwarePlatform/pytorch.git
PyTorch Official https://github.com/pytorch/pytorch.git
PyTorch discussion https://discuss.pytorch.org/t/pytorch-with-rocm-benchmarks/31535
Qiita @yu4u https://qiita.com/yu4u/items/c6e24d862325fac96f61
ROCm https://github.com/ROCmSoftwarePlatform
MIOpen https://gpuopen.com/compute-product/miopen/
GPUEater summarized TensorFlow-ROCm https://github.com/aieater/rocm_tensorflow_info

Are you interested in working with us?

We are actively looking for new members for developing and improving GPUEater cloud platform. For more information, please check here.

GPU EATER - AMD GPU-based Deep Learning Cloud