Part 8 : Install TensorFlow 2 GPU using Docker on Ubuntu

In previous posts, we have built simple neural networks by hand. Fortunately, there are libraries to build network architectures and calculate gradients automatically. TensorFlow is one of the most famous one. I will explain how to install this Python library on Ubuntu 18.04.

Why using Docker?

Neural network calculations are primarily based on matrix operations, which are most efficiently performed on GPUs. In order to use your computer’s GPU with TensorFlow, it is necessary to install 2 libraries on your machine:

CUDA (Compute Unified Device Architecture): a parallel computing platform developed by NVIDIA for general computing on GPUs
cuDNN (CUDA Deep Neural Network): a GPU-accelerated library of primitives used to accelerate deep learning frameworks such as TensorFlow or Pytorch.

As you can see, there is a lot of prerequisites before being able to install TensorFlow. You can follow the official procedure to install CUDA from the NVIDIA website here. However, I learnt the hard way that it is easy to mess up your computer and your graphics card while installing all these libraries and drivers. That’s why, I would highly recommend installing TensorFlow inside a Docker container.

Docker is essentially a self-contained OS with all the dependencies necessary for a smooth installation. Here is a graphical explanation of the installation.

tensorflow_docker

Let’s install!

First of all, check the instructions on the official TensorFlow page.

1. Install Docker

Please follow these instructions.

Optional: uninstall old Docker versions

sudo apt-get remove docker docker-engine docker.io containerd runc

Install required packages

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release

Add Docker’s official GPG key

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

Set up the repository

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the latest version of Docker Engine

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

Setup Docker-CE

curl https://get.docker.com | sh && sudo systemctl --now enable docker

Add the current user to the Docker group (use Docker without sudo)

sudo groupadd docker
sudo usermod -aG docker $USER

Then log out and log back in to activate the changes (or reboot)

Test that you can use Docker without sudo

docker run hello-world

Check that you have installed Docker 19.03 or higher.

docker -v

2.Install the latest NVIDIA drivers

Take note of your GPU brand and make

lspci | grep -i nvidia
# OR
sudo lshw -C display

Verify you have a CUDA-Capable GPU by checking if it is listed here.
Verify you have a supported version of Linux

uname -m && cat /etc/*release

You should check that you are running on a 64-bit system (x86_64).

Verify the system has gcc installed

gcc --version

Verify the system has correct Linux kernel headers

# list the Linux kernel
uname -r  
# Install the Linux kernel hearders
sudo apt-get install linux-headers-$(uname -r)

Install the CUDA repository public GPG key

distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb

Update the apt repository cache and install the driver

sudo apt-get update
sudo apt-get -y install cuda-drivers

If you installed CUDA (not the case here), export Cuda to the PATH variable

export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}

Make this change permanent by adding it to your .bashrc file

echo 'export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}' >> ~/.bashrc
source ~/.bashrc

Check that the NVIDIA Persistence Daemon is active

systemctl status nvidia-persistenced

Disable the udev rule because it could interfere with the driver

# copy the udev rule
sudo cp /lib/udev/rules.d/40-vm-hotadd.rules /etc/udev/rules.d

# edit the udev rule
sudo vim /etc/udev/rules.d/40-vm-hotadd.rules

Comment out this line:

SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state", ATTR{state}!="online", ATTR{state}="online"

Verify the installation and write down the driver version (in my case 515)

cat /proc/driver/nvidia/version

Enable NVIDIA persistence mode for GPU

sudo -i
nvidia-smi -pm 1
exit

Enable persistence Daemon permanently

sudo apt install libnvidia-cfg1-515 #replace 51 with your driver version
sudo nvidia-persistenced --user USER #replace USER with your username
sudo reboot

Install third-party libraries

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

Alternative installation of the NVIDIA driver on Ubuntu

sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot

3. Install the NVIDIA Container toolkit

Please follow these instructions.

Setup the package repository and the GPG key

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the nvidia-docker2 package

sudo apt-get update
sudo apt-get install -y nvidia-docker2

Restart the Docker daemon

sudo systemctl restart docker

Test the installation of the NVIDIA Container toolkit

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

You should see some information about your GPU and the CUDA version installed.

4. Install the TensorFlow Docker images with GPU support

Pull the image with GPU support

docker pull tensorflow/tensorflow:latest-gpu

Run the Docker image

docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.version); print(tf.config.list_physical_devices('GPU')); print(tf.test.is_built_with_cuda())"

This should return the TensorFlow version and whether GPU support is available.

Please have a look at my Docker cheat sheet for more information about Docker.

5. Run a TensorFlow container

Create a new container from the TensorFlow image.

docker run -it --rm tensorflow/tensorflow:latest-gpu

You should be logged-in in the new container. You can explore it using ls, cd, etc… You can exit using $ exit. Now let’s see a more practical example. First, let’s create a directory to exchange files between your machine and the container. In another terminal, run this to create a new directory for the Docker workspace:

mkdir ~/docker_ws

docker run -u $(id -u):$(id -g) --gpus all -it --rm --name my_tf_container -v ~/docker_ws:/notebooks -p 8888:8888 -p 6006:6006 tensorflow/tensorflow:latest-gpu-py3-jupyter

Let’s explain the different options.

-u $(id -u):$(id -g)       # assign a user and a group ID
--gpus all                 # allow GPU support
-it                        # run an interactive container inside a terminal
-rm                        # automatically clean up the container and remove the file system after closing the container
--name my_tf_container     # give it a friendly name
-v ~/docker_ws:/notebooks  # share a directory between the host and the container
-p 8888:8888               # define port 8888 to connect to the container
-p 6006:6006               # forward port 6006 for Tensorboard

Once the container is running, your should see a URL to copy and paste in your browser that looks like http://127.0.0.1:8888/?token=xxxxxxxxxx. You should then see a list of TensorFlow tutorials, as shown below.

tf_tutorials _{^{Tensorflow tutorials}}

Finally, you can run a command inside a running docker container with this command:

docker exec -it my_tf_container tensorboard --logdir tf_logs/

You should be able to access the TensorBoard page via this URL http://localhost:6006/ (see also this tutorial)

Play around with the tutorials and enjoy!

Part 8 : Install TensorFlow 2 GPU using Docker on Ubuntu

Pierre Aumjaud

Why using Docker?

Let’s install!

1. Install Docker

2.Install the latest NVIDIA drivers

3. Install the NVIDIA Container toolkit

4. Install the TensorFlow Docker images with GPU support

5. Run a TensorFlow container

Share on

Leave a comment

Also read

Part 9 : Install TensorFlow 2 GPU support on Ubuntu

Part 7 : Non-linear classification with Keras

Part 6 : Non-linear classification with Pytorch

Part 5 : Multi-class classification with neural networks