POSTS
BERT on AMD RadeonGPU
Introduction
It is often asked by natural language processors on AMD Radeon GPU whether BERT works, so I will introduce the story when I verified it.
Results
Confirming Installation
Official Documentation: https://github.com/google-research/bert
It seems that it should be possible to move from TensorFlow 1.11 to 1.12.0. As ROC m2.3 update has come, I will set up on an experimental machine and run it on Radeon VII.
Originally, on the TensorFlow 1.12.0 + ROC m 2.2 machine,
curl -sL http://install.aieater.com/setup_rocm | bash -
or
sudo apt upgrade -y
update drivers only.
Confirm Radeon VII on ROCm-TensorFlow with the following command.
python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: hipModuleGetGlobal
Because I got an error, I deleted it once with pip3 uninstall tensorflow-rocm and installed pip3 install tensorflow-rocm == 1.12.0 -U again.
ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: hipModuleGetGlobal
Because I get the same error, I thought it was something like a bug in the HIP around,
pip3 install tensorflow-rocm -U
Install the latest version at. tensorflow-rocm 1.13.1 has been installed.
johndoe@thiguhag:~$ python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
2019-04-15 23:10:40.484698: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-15 23:10:40.485199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties:
name: Vega 20
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.802
pciBusID 0000:03:00.0
Total memory: 15.98GiB
Free memory: 15.73GiB
2019-04-15 23:10:40.485213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-04-15 23:10:40.485374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 23:10:40.485391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059] 0
2019-04-15 23:10:40.485395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0: N
2019-04-15 23:10:40.485421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)
It was recognized safely. The official specified version is 1.11 but for now let’s run BERT.
Download the GLUE data set download script, and download the BERT model checkpoint file. Specify the decompression destination and start it with python3.
git clone https://github.com/google-research/bert.git && cd bert
wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python3 download_glue_data.py
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
export BERT_BASE_DIR=uncased_L-12_H-768_A-12
export GLUE_DIR=glue_data
python3 run_classifier.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=/tmp/mrpc_output/
Results
.
.
.
NFO:tensorflow:Graph was finalized.
2019-04-15 22:24:02.703807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-04-15 22:24:02.703846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 22:24:02.703851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059] 0
2019-04-15 22:24:02.703855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0: N
2019-04-15 22:24:02.703874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/mrpc_output/model.ckpt-343
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-04-15-13:24:06
INFO:tensorflow:Saving dict for global step 343: eval_accuracy = 0.8627451, eval_loss = 0.3959406, global_step = 343, loss = 0.3959406
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 343: /tmp/mrpc_output/model.ckpt-343
INFO:tensorflow:evaluation_loop marked as finished
INFO:tensorflow:**** Eval results ****
INFO:tensorflow: eval_accuracy = 0.8627451
INFO:tensorflow: eval_loss = 0.3959406
INFO:tensorflow: global_step = 343
INFO:tensorflow: loss = 0.3959406
I was able to move it safely. However, if the version of TensorFlow isn’t correct, Keras based programs might get stuck and it will cause problems.
References
- TensorFlor-ROCm / HipCaffe / PyTorch-ROCm / Caffe2 installation https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html
- ROCm https://github.com/ROCmSoftwarePlatform
- MIOpen https://gpuopen.com/compute-product/miopen/
- GPUEater tensorflow-rocm installer https://github.com/aieater/rocm_tensorflow_info
- Google Research BERT https://github.com/google-research/bert
Are you interested in working with us?
We are actively looking for new members for developing and improving GPUEater cloud platform. For more information, please check here.