POSTS

CenterNet on AMD RadeonGPU

June 12, 2019

Introduction

Because there was a better ObjectDetection paper than M2Det, I checked the operation on Radeon GPU. M2Det is also Chinese, and CenterNet is a model called CenterNet written by Chinese people. According to the paper, it will be the most accurate and lightest model, YoloV3 <M2Det <CenterNet.

CenterNet: Keypoint Triplets for Object Detection https://arxiv.org/abs/1904.08189

PyTorch Implementation https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md

Keras Implementation https://github.com/see--/keras-centernet

Installation Check

Clone it and put in the required package.

git clone https://github.com/see--/keras-centernet
cd keras-centernet
sudo pip3 install -r requirements.txt

Required package is

 Keras==2.2.4
 opencv-python==3.4.3.18
 tqdm==4.26.0
 youtube-dl==2019.4.30
 pytest==4.4.1
 Pillow==6.0.0
 matplotlib==3.0.3
 Cython==0.29.7
 pycocotools==2.0.0

Keras Backend uses tensorflow-rocm 1.13.3. youtube-dl is unnecessary this time.

Check GPU (Radeon VII) on ROCm-TensorFlow with the following command.

python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"

johndoe@thiguhag:~$ python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
2019-04-15 23:10:40.484698: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-15 23:10:40.485199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties:
name: Vega 20
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.802
pciBusID 0000:03:00.0
Total memory: 15.98GiB
Free memory: 15.73GiB
2019-04-15 23:10:40.485213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-04-15 23:10:40.485374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 23:10:40.485391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0
2019-04-15 23:10:40.485395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N
2019-04-15 23:10:40.485421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)

Run the sample. The Weights file is not explicit but contains the downloader in the following script and is automatically obtained, so it is very easy to operate.

PYTHONPATH=. python3 keras_centernet/bin/ctdet_image.py --fn assets/demo2.jpg --inres 512,512

johndoe@thiguhag:~/keras-centernet$ PYTHONPATH=. python3 keras_centernet/bin/ctdet_image.py --fn assets/demo2.jpg --inres 512,512
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-06-12 04:33:20.069565: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-12 04:33:20.125171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties:
name: Vega 20
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.802
pciBusID 0000:03:00.0
Total memory: 15.98GiB
Free memory: 15.73GiB
2019-06-12 04:33:20.125200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-06-12 04:33:20.125210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-12 04:33:20.125214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0
2019-06-12 04:33:20.125218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N
2019-06-12 04:33:20.125250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)
Image saved to: output/ctdet.demo2.jpg
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,

This is a image

I was able to move it safely.

References

TensorFlor-ROCm / HipCaffe / PyTorch-ROCm / Caffe2 installation https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html
ROCm https://github.com/ROCmSoftwarePlatform
MIOpen https://gpuopen.com/compute-product/miopen/
GPUEater tensorflow-rocm installer https://github.com/aieater/rocm_tensorflow_info
CenterNet: Keypoint Triplets for Object Detection https://arxiv.org/abs/1904.08189
PyTorch-CenterNet https://github.com/xingyizhou/CenterNet/blob/master/readme/INSTALL.md
Keras-CenterNet https://github.com/see--/keras-centernet

Are you interested in working with us?

We are actively looking for new members for developing and improving GPUEater cloud platform. For more information, please check here.

GPU EATER - AMD GPU-based Deep Learning Cloud