POSTS
PyTorch-ROCm on AMD Radeon GPU
Introduction
PyTorch supports ROC m2.1!
This is an installation guide for running on AMD Radeon GPUs.
Installation
AMDGPUdriver 2.1 supports PyTorch1.x.x
It was announced that PyTorch was officially supported.
https://rocm.github.io/dl.html
Deep Learning on ROCm
TensorFlow: TensorFlow for ROCm – latest supported version 1.13
MIOpen: Open-source deep learning library for AMD GPUs – latest supported version 1.7.1
PyTorch: PyTorch for ROCm – latest supported version 1.0
Installation Issues(2019/03/01)
The official page only describes Docker-based installation methods, and there is no documentation for installing from scratch.
https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html You will find installation details on the page above, but there is lack of installation instructions from scratch.
python tools/amd_build/build_pytorch_amd.py python tools/amd_build/build_caffe2_amd.py
The part of hippify(converts CUDA code to HIP code) is also different.
Since docker file is defined on the following repository we can can search for the latest installation instructions based on this file.
https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/Dockerfile
We have set up an installer that we have put together using the information on the Dockerfile. You can follow the link to find out how to install AMDGPU ROCm-PyTorch 1.1.0a based on Ubuntu 16.04 + Python 3.5 or Python 3.6.
curl -sL http://install.aieater.com/setup_pytorch_rocm | bash -
You have to rebuild from source for each graphics card type.
gfx806(RX550/560/570/580) gfx900(VegaFrontierEdition/Vega56/Vega64/WX9100/MI25) gfx906(RadeonVII/MI50/MI60)
The script above will have an option for during installation to select a graphics card.
In the snipper below we will examine the contents of the installer.
AMDGPU ROCm-PyTorch1.1.0a installation script
# curl -sL http://install.aieater.com/setup_pytorch_rocm | bash -
apt-get update && apt-get install -y --no-install-recommends curl && \
curl -sL http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add - && \
sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' \
apt-get update && apt-get install -y --no-install-recommends \
libelf1 \
build-essential \
bzip2 \
ca-certificates \
cmake \
ssh \
apt-utils \
pkg-config \
g++-multilib \
gdb \
git \
less \
libunwind-dev \
libfftw3-dev \
libelf-dev \
libncurses5-dev \
libomp-dev \
libpthread-stubs0-dev \
make \
miopen-hip \
miopengemm \
python3-dev \
python3-future \
python3-yaml \
python3-pip \
vim \
libssl-dev \
libboost-dev \
libboost-system-dev \
libboost-filesystem-dev \
libopenblas-dev \
rpm \
wget \
net-tools \
iputils-ping \
libnuma-dev \
rocm-dev \
rocrand \
rocblas \
rocfft \
hipsparse \
hip-thrust \
rccl \
curl -sL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
sh -c 'echo deb [arch=amd64] http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main > /etc/apt/sources.list.d/llvm7.list' && \
sh -c 'echo deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main >> /etc/apt/sources.list.d/llvm7.list'\
apt-get update && apt-get install -y --no-install-recommends clang-7
apt-get clean && \
rm -rf /var/lib/apt/lists/*
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
prf=`cat <<'EOF'
export HIP_VISIBLE_DEVICES=0
export HCC_HOME=/opt/rocm/hcc
export ROCM_PATH=/opt/rocm
export ROCM_HOME=/opt/rocm
export HIP_PATH=/opt/rocm/hip
export PATH=/usr/local/bin:$HCC_HOME/bin:$HIP_PATH/bin:$ROCM_PATH/bin:/opt/rocm/opencl/bin/x86_64:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/opencl/lib/x86_64
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export HIP_PLATFORM="hcc"
export KMTHINLTO="1"
export CUPY_INSTALL_USE_HIP=1
export MAKEFLAGS=-j8
export __HIP_PLATFORM_HCC__
export HIP_PLATFORM=hcc
export PLATFORM=hcc
export USE_ROCM=1
export MAX_JOBS=2
EOF
`
GFX=gfx900
echo "Select a GPU type."
select INS in RX500Series\(RX550/RX560/RX570/RX580/RX590\) Vega10Series\(Vega56/64/WX9100/FE/MI25\) Vega20Series\(RadeonVII/MI50/MI60\) Default
do
case $INS in
Patch)
PATCH;
break;;
RX500Series\(RX550/RX560/RX570/RX580/RX590\))
GFX=gfx806
break;;
Vega10Series\(Vega56/64/WX9100/FE/MI25\))
GFX=gfx900
break;;
Vega20Series\(RadeonVII/MI50/MI60\))
GFX=gfx906
break;;
Default)
break;;
*) echo "ERROR: Invalid selection"
;;
esac
done
export HCC_AMDGPU_TARGET=$GFX
echo "$prf" >> ~/.profile
source ~/.profile
pip3 install cython pillow h5py numpy scipy requests sklearn matplotlib editdistance pandas portpicker jupyter setuptools pyyaml typing enum34 hypothesis
update-alternatives --install /usr/bin/gcc gcc /usr/bin/clang-7 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/clang++-7 50
# git clone https://github.com/pytorch/pytorch.git
git clone https://github.com/ROCmSoftwarePlatform/pytorch.git pytorch-rocm
cd pytorch-rocm
git checkout e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22 # 1.1.0a0+e6991ed
git submodule init
git submodule update
#python3 tools/amd_build/build_pytorch_amd.py
#python3 tools/amd_build/build_caffe2_amd.py
python3 tools/amd_build/build_amd.py
python3 setup.py install
pip3 install torchvision
cd ~/
clinfo | grep ' Name:'
python3 -c "import torch;print('CUDA(hip) is available',torch.cuda.is_available());print('cuda(hip)_device_num:',torch.cuda.device_count());print('Radeon device:',torch.cuda.get_device_name(torch.cuda.current_device()))"
What should be changed from the script above are the following lines.
#python3 tools/amd_build/build_pytorch_amd.py
#python3 tools/amd_build/build_caffe2_amd.py
python3 tools/amd_build/build_amd.py
The hippify script has been merged into one
Because it is currently under development.
PyTorch Official https://github.com/pytorch/pytorch.git
The latest build does not work.
PyTorch-ROCm https://github.com/ROCmSoftwarePlatform/pytorch.git
git checkout e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22 # 1.1.0a0+e6991ed
the following git commit builds succesfully
e6991ed29fec9a7b7ffb09b6ec58fb9d3fec3d22
CUDA to HIP
CUPA code can run on AMD-Radeon GPU by compiling CUDA code into HIP code using of hippify.
Therefore, the designation of the operation device is specified and moved as ‘cuda’.
There is also an item with a hip device specification, but specifying it as hip does not work. You must specify ‘cuda’.
Benchmarking
https://github.com/marvis/pytorch-mobilenet
Benchmarking mobilenet on pytorch
AMDGPU RadeonVII + ROCm2.1 + ROCm-PyTorch1.1.0a
use_gpu: True, nb_batches: 1
resnet18 : 0.005838 (sd 0.000290)
alexnet : 0.001124 (sd 0.000137)
vgg16 : 0.001759 (sd 0.000033)
squeezenet : 0.003084 (sd 0.000115)
mobilenet : 0.007428 (sd 0.000213)
use_gpu: True, nb_batches: 16
resnet18 : 0.005712 (sd 0.000202)
alexnet : 0.001107 (sd 0.000019)
vgg16 : 0.002957 (sd 0.001784)
squeezenet : 0.006802 (sd 0.003843)
mobilenet : 0.007036 (sd 0.000301)
The following Japanese website provides the code for comparing different models on cpu and gpu
https://qiita.com/yu4u/items/c6e24d862325fac96f61
The results are below.
Ubuntu 16.04, CPU: i7-7700 3.60GHz、GPU: GeForce GTX1080 PyTorch0.1.11
use_gpu: True, nb_batches: 1
resnet18 : 0.001915 (sd 0.000057)
alexnet : 0.000691 (sd 0.000005)
vgg16 : 0.002390 (sd 0.002091)
squeezenet : 0.002086 (sd 0.000104)
mobilenet : 0.048602 (sd 0.000380)
use_gpu: True, nb_batches: 16
resnet18 : 0.006055 (sd 0.005111)
alexnet : 0.000744 (sd 0.000014)
vgg16 : 0.025156 (sd 0.029848)
squeezenet : 0.012983 (sd 0.000024)
mobilenet : 0.064022 (sd 0.000411)
use_gpu: False, nb_batches: 1
resnet18 : 0.218282 (sd 0.002961)
alexnet : 0.081834 (sd 0.000445)
vgg16 : 1.484166 (sd 0.001384)
squeezenet : 0.102657 (sd 0.002118)
mobilenet : 0.141093 (sd 0.005197)
use_gpu: False, nb_batches: 16
resnet18 : 0.896854 (sd 0.004594)
alexnet : 0.283497 (sd 0.003010)
vgg16 : 5.622119 (sd 0.020102)
squeezenet : 0.514910 (sd 0.004134)
mobilenet : 0.892604 (sd 0.017502)
Although benchmarks on GeForce GTX 1080 have been published, it is not good to compare the environment with a state where PyTorch is less than 1.0, so we plan to re-do benchmarks later. Also, as we have confirmed that the GPU process is zombied with ROCm2.1 / ROCm2.2, it is also worth noting a point that has been slightly destabilized as in ROCm 1.7.
Compile Time Warning
When compiling ROCm-PyTorch from source There are quite a few warnings about unrolling, but the compilation itself goes smoothly. At runtime
warning: <unknown>:0:0: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering
The warning makes it clear that there is a penalty on performance and also we confirmed it has an apparent impact on performance.
Current Impression
As ROCm-PyTorch still has unrolled expansion after transcompilation and there is a performance penalty, depending on the model, optimization may not take place but there is considerable variation. It is necessary to use with caution.
References
- TensorFlor-ROCm / HipCaffe / PyTorch-ROCm / Caffe2 installation https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html
- TensorFlow-ROCm https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
- PyTorch-ROCm https://github.com/ROCmSoftwarePlatform/pytorch.git
- PyTorch Official https://github.com/pytorch/pytorch.git
- PyTorch discussion https://discuss.pytorch.org/t/pytorch-with-rocm-benchmarks/31535
- Qiita @yu4u https://qiita.com/yu4u/items/c6e24d862325fac96f61
- ROCm https://github.com/ROCmSoftwarePlatform
- MIOpen https://gpuopen.com/compute-product/miopen/
- GPUEater summarized TensorFlow-ROCm https://github.com/aieater/rocm_tensorflow_info
Are you interested in working with us?
We are actively looking for new members for developing and improving GPUEater cloud platform. For more information, please check here.