長いタイトルだけど書いてあるとおりで…
実行環境
最近ideapad 710S Plusを買いまして、
こいつは10万切るくせにCore i7とGeForce 940MXを載せています。
GeForce 940MXはCUDAに対応しているので、TensorFlowのGPU版をインストールしてみようと思った次第。
作業内容
事前にAnacondaはインストールしてあり、
Python3.6でTensorFlowは動いている状態でした。
CUDA
NVIDIAからCUDA Toolkitをダウンロード。
Windows版のCUDA 8.0をインストール。
Visual Studio
CUDAのインストール途中に「Visual Studioがないよ」って言われたのでVisual Studio 2017を入れる。
→まだないよって言われて2015までしか対応していないのに気づき、
わざわざDev Essentialsプログラムに登録して過去のバージョンからVisual Studio Community 2015をインストール。
でも多分これは必要なくて、TensorFlowの実行にはVisual Studio 2015 の Visual C++ 再頒布可能パッケージがあればよさそう。
cuDNN
cuDNNは5.1をインストール。
NVIDIAからcuDNNをインストール。
インストールと言うか、ダウンロードしたファイルをCUDA Toolkitのディレクトリにコピペするだけ。
tensorflow-gpuをインストール。
pipからインストールします。
pip install tensorflow
動かない
ここまででpythonを起動し、
import tensorflow
を実行すると、
ImportError: Traceback (most recent call last): File "C:\Users\tono\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swig_import_helper return importlib.import_module(mname) File "C:\Users\tono\Anaconda3\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 978, in _gcd_import File "<frozen importlib._bootstrap>", line 961, in _find_and_load File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 648, in _load_unlocked File "<frozen importlib._bootstrap>", line 560, in module_from_spec File "<frozen importlib._bootstrap_external>", line 922, in create_module File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed ImportError: DLL load failed: 指定されたモジュールが見つかりません。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\tono\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 41, in <module> from tensorflow.python.pywrap_tensorflow_internal import * File "C:\Users\tono\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in <module> _pywrap_tensorflow_internal = swig_import_helper() File "C:\Users\tono\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swig_import_helper return importlib.import_module('_pywrap_tensorflow_internal') File "C:\Users\tono\Anaconda3\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'
失敗…
試したこと
とりあえず「tensorflow gpu DLL load failed」とかで検索して出てきたものをすべて試しました。
- 環境変数を見直す。CUDAのディレクトリにパスを通しまくる
- cuDNNのファイルをCUDAにコピーしたものとは別に置いておく。
- VisualStudioをアンインストールしてVisual C++ 再頒布可能パッケージをインストールする。
- pipのインストールコマンドをここに書いているコマンドに変えて実行する(35は36に置換)
- anacondaをアンインストールしてPython3.5を直接インストールして上記を繰り返す
が、全部ダメ…
結局
いろいろ調べていたところ、リリースノートで4日前にTensorFlow v1.3.0がリリースされていることを知りました。
普通にpipでインストールするとこちらが入ってしまう模様。
内容を見てみると、
All our prebuilt binaries have been built with cuDNN 6. We anticipate releasing TensorFlow 1.4 with cuDNN 7.
cuDNN6使わなあかんのかい!!
他にも色々変わっているようなので、1.3をアンインストールして1.2.1をインストール。
pip install tensorflow-gpu==1.2.1
無事実行することができました。
肝心の性能は…
詳解ディープラーニングのLSTMサンプルを実行したところ…
CPU(Core i7-7500U(2.70GHz 4MB))
C:\Users\tono\Anaconda3\python.exe C:/Users/tono/Python/deeplearning-tensorflow-keras/5/tensorflow/01_00_sin_simple_lstm_tensorflow.r1.2.py 2017-08-21 08:57:50 2017-08-21 08:58:04.323024: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.323316: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.323621: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.323925: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.324944: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.325351: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.325618: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:58:04.326075: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. epoch: 0 validation loss: 0.334986 epoch: 1 validation loss: 0.246942 epoch: 2 validation loss: 0.162596 epoch: 3 validation loss: 0.0796628 epoch: 4 validation loss: 0.0457967 epoch: 5 validation loss: 0.0302476 epoch: 6 validation loss: 0.0269227 epoch: 7 validation loss: 0.0115688 epoch: 8 validation loss: 0.00999371 epoch: 9 validation loss: 0.00668167 epoch: 10 validation loss: 0.0143003 epoch: 11 validation loss: 0.00699237 epoch: 12 validation loss: 0.00582822 epoch: 13 validation loss: 0.00351853 epoch: 14 validation loss: 0.00194349 epoch: 15 validation loss: 0.0013511 epoch: 16 validation loss: 0.00132878 epoch: 17 validation loss: 0.00183425 epoch: 18 validation loss: 0.00177953 epoch: 19 validation loss: 0.00229132 epoch: 20 validation loss: 0.0011444 epoch: 21 validation loss: 0.00208757 epoch: 22 validation loss: 0.00160754 epoch: 23 validation loss: 0.00150985 epoch: 24 validation loss: 0.00124751 epoch: 25 validation loss: 0.00146131 epoch: 26 validation loss: 0.0017651 epoch: 27 validation loss: 0.0014507 epoch: 28 validation loss: 0.00116534 epoch: 29 validation loss: 0.00121309 epoch: 30 validation loss: 0.00196409 epoch: 31 validation loss: 0.00180024 early stopping 2017-08-21 08:58:09
実行時間:19秒
GPU(GeForce 940MX(1GHz))
C:\Users\tono\Anaconda3\python.exe C:/Users/tono/Python/deeplearning-tensorflow-keras/5/tensorflow/01_00_sin_simple_lstm_tensorflow.r1.2.py 2017-08-21 08:51:34 2017-08-21 08:51:45.876056: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.876336: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.876610: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.876886: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.877157: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.877434: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.877713: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:45.877983: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-08-21 08:51:46.605969: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties: name: GeForce 940MX major: 5 minor: 0 memoryClockRate (GHz) 0.993 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.66GiB 2017-08-21 08:51:46.606313: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0 2017-08-21 08:51:46.606469: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: Y 2017-08-21 08:51:46.606636: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0) 2017-08-21 08:51:48.491741: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\pool_allocator.cc:247] PoolAllocator: After 1408 get requests, put_count=1404 evicted_count=1000 eviction_rate=0.712251 and unsatisfied allocation rate=0.784091 2017-08-21 08:51:48.492035: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110 epoch: 0 validation loss: 0.334986 epoch: 1 validation loss: 0.246942 epoch: 2 validation loss: 0.162596 epoch: 3 validation loss: 0.0796628 2017-08-21 08:51:49.293386: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\pool_allocator.cc:247] PoolAllocator: After 3521 get requests, put_count=3594 evicted_count=1000 eviction_rate=0.278242 and unsatisfied allocation rate=0.26981 2017-08-21 08:51:49.293886: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281 epoch: 4 validation loss: 0.0457967 epoch: 5 validation loss: 0.0302476 epoch: 6 validation loss: 0.0269227 epoch: 7 validation loss: 0.0115688 epoch: 8 validation loss: 0.00999371 epoch: 9 validation loss: 0.00668168 epoch: 10 validation loss: 0.0143003 epoch: 11 validation loss: 0.00699238 epoch: 12 validation loss: 0.00582822 epoch: 13 validation loss: 0.00351853 epoch: 14 validation loss: 0.00194349 epoch: 15 validation loss: 0.0013511 epoch: 16 validation loss: 0.00132878 epoch: 17 validation loss: 0.00183425 epoch: 18 validation loss: 0.00177953 epoch: 19 validation loss: 0.00229132 epoch: 20 validation loss: 0.0011444 epoch: 21 validation loss: 0.00208757 epoch: 22 validation loss: 0.00160755 epoch: 23 validation loss: 0.00150985 epoch: 24 validation loss: 0.00124751 epoch: 25 validation loss: 0.00146131 epoch: 26 validation loss: 0.0017651 epoch: 27 validation loss: 0.0014507 epoch: 28 validation loss: 0.00116533 epoch: 29 validation loss: 0.00121309 epoch: 30 validation loss: 0.00196409 epoch: 31 validation loss: 0.00180024 early stopping 2017-08-21 08:51:55
実行時間:21秒
遅くなってんじゃねえか!!!!!
でも他のモデルではやっぱり速い
上記のモデルはサイン波の予想なのでそれほど計算がいらなかったのかもしれませんが、
BiRNNでMNISTを予測したり、Adding taskをRNN Encoder Decoderを使って予測するタスクでは
概ね3倍~4倍の速度が出ました。
やはりこれくらいのGPUでも効果出ましたね。買ってよかった。
コメントを残す