How to setup Tensorflow/Keras running on Nvidia GPU

We will use Virtualenv for this set-up. Just following the commands:
1. Install Virtualenv:
sudo pip install virtualenv
2. Commands to setup Virtualenv
- mkdir virtualws
- virtualenv virtualws/keras_demo (if creating for python3: virtualenv -p python3 virtualws/keras_demo)
- cd virtualws/keras_demo/bin
- source activate
3. Install Tensorflow as Keras
- pip install tensorflow_gpu
- pip install keras
Note: we installed "tensorflow_gpu"
4. Install driver for Nvidia GPU:

Remove old versions:

sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"

sudo apt-get --purge remove "*nvidia*"

sudo rm -rf /usr/local/cuda*
4.1 Method 1
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-xxx
Note: "xxx" can be checked here.
or you can download the driver here (.run file) and install it manually. Then using commands:
sudo chmod +x xxx.run
./xxx.run
4.2 Method 2
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
And using Software & Updates => Additional Drivers
Note: choose the version that match your Linux kernel version (using "uname -r")
4.3 Method3
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-driver-xxx
Note: this way you should choose gcc driver with version gcc-5, g++-5
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5
Refer: https://askubuntu.com/questions/26498/how-to-choose-the-default-gcc-and-g-version
Then verify the driver using: lsmod | grep nvidia
5. Install NVIDIA CUDA Toolkit and cuDNN
To check which version od NVIDIA CUDA Toolkit and cuDNN are needed. We create a simple demo that using Tensorflow/Keras.
Create a test.py file with content and run python test.py

# 3. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
 
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
 
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
 
# 7. Define model architecture
model = Sequential()

model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28), data_format='channels_first'))
model.add(Convolution2D(32, (3, 3), activation='relu', data_format='channels_first'))
model.add(MaxPooling2D(pool_size=(2,2), data_format='channels_first'))
model.add(Dropout(0.25))
 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
 
# 8. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
 
# 9. Fit model on training data
model.fit(X_train, Y_train, 
          batch_size=32, nb_epoch=10, verbose=1)
 
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)

Python will throw error that it could not load the library file with version. Based on that you can download NVIDIA CUDA Toolkit and cuDNN versions accordingly.
If you download NVDIA CUDA Toolkit as a ".run" file. Just using commands:
sudo chmod +x xxx.run
./xxx.run -silent to ignore EULA/prompts
or ./xxx.run to choose YES/NO according to EULA/prompts

EULA/prompts will appear, press ENTER until 100% was reached and choose YES for all the questions except the question that asking you to install NVDIA driver (If you got driver just choose NO).
Then open the hiden file .bashrc at /home/your_user/.bashrc and append the lines:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}

In order to install cuDNN, unzip it and using commands:
cd to_unzip_folder
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
6. Install dependencies
Run the commands:
sudo apt-get install libcupti-dev
Again open .bashrc and append the line:
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Note:
To monitor GPU running using command: watch -n 1 nvidia-smi
That is all.
Run the demo above again. The training process is faster: