POSTS

AMD RadeonGPU上で BERT

April 15, 2019

Introduction

AMD RadeonGPUで自然言語処理系の人からBERTが動くかどうかよく聞かれるので、検証したときの話を紹介致します。

結果

動作可能。

動作確認

まずはOfficialのドキュメントを一読。 https://github.com/google-research/bert

端的にTensorFlow1.11が動かせれば良いみたいなので、1.12.0をインストール。 ROCm2.3の更新が来ていたので実験機にセットアップしてRadeonVII上で動かすこととします。

もともと、TensorFlow1.12.0＋ROCm2.2のマシンに、

curl -sL http://install.aieater.com/setup_rocm | bash -

sudo apt upgrade -y

にてドライバのみ更新。

以下のコマンドでROCm-TensorFlow上でRadeonVIIを確認。

python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"

ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: hipModuleGetGlobal

というエラーが出たので、pip3 uninstall tensorflow-rocmで一度削除して、再度pip3 install tensorflow-rocm==1.12.0 -Uをインストール。

ImportError: /usr/local/lib/python3.5/dist-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: hipModuleGetGlobal

同じエラーが出るので、HIP周りにバグが入ったかなにかだと思ったのですが、

pip3 install tensorflow-rocm -U

にて最新版をインストール。tensorflow-rocm 1.13.1がインストールされました。

johndoe@thiguhag:~$ python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
2019-04-15 23:10:40.484698: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-15 23:10:40.485199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: 
name: Vega 20
AMDGPU ISA: gfx906
memoryClockRate (GHz) 1.802
pciBusID 0000:03:00.0
Total memory: 15.98GiB
Free memory: 15.73GiB
2019-04-15 23:10:40.485213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-04-15 23:10:40.485374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 23:10:40.485391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0 
2019-04-15 23:10:40.485395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N 
2019-04-15 23:10:40.485421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)

無事認識されました。公式の指定されたバージョンは1.11ですがとりあえずBERTを動かしてみます。

GLUEデータセットのダウンロードスクリプトでダウンロード、BERTモデルのチェックポイントファイルをダウンロード。解凍先を指定してpython3にて起動させます。

git clone https://github.com/google-research/bert.git && cd bert

wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python3 download_glue_data.py

wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip

export BERT_BASE_DIR=uncased_L-12_H-768_A-12
export GLUE_DIR=glue_data
python3 run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=/tmp/mrpc_output/

結果

.
.
.
NFO:tensorflow:Graph was finalized.
2019-04-15 22:24:02.703807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0
2019-04-15 22:24:02.703846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-15 22:24:02.703851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059]      0 
2019-04-15 22:24:02.703855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0:   N 
2019-04-15 22:24:02.703874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15306 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/mrpc_output/model.ckpt-343
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-04-15-13:24:06
INFO:tensorflow:Saving dict for global step 343: eval_accuracy = 0.8627451, eval_loss = 0.3959406, global_step = 343, loss = 0.3959406
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 343: /tmp/mrpc_output/model.ckpt-343
INFO:tensorflow:evaluation_loop marked as finished
INFO:tensorflow:**** Eval results ****
INFO:tensorflow:  eval_accuracy = 0.8627451
INFO:tensorflow:  eval_loss = 0.3959406
INFO:tensorflow:  global_step = 343
INFO:tensorflow:  loss = 0.3959406

無事動かすことができました。ただTensorFlowのバージョンがずれると、いろいろ今まで動かしてたKerasベースのプログラムが動かなくなるので、弊害が出ます。

References

TensorFlor-ROCm / HipCaffe / PyTorch-ROCm / Caffe2 installation https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html
ROCm https://github.com/ROCmSoftwarePlatform
MIOpen https://gpuopen.com/compute-product/miopen/
GPUEater tensorflow-rocm installer https://github.com/aieater/rocm_tensorflow_info
Google Research BERT https://github.com/google-research/bert

エンジニア募集中

GPU EATERの開発を一緒に行うメンバーを募集しています。

特にディープラーニング研究者、バックエンドエンジニアを積極採用中です。

募集職種はこちら

GPU EATER - AMD GPU-based Deep Learning Cloud