Bluefog Docker Usage¶
Bluefog provides dockers with all necessary dependency for system environment isolation.
Therefore two types of dockerfiles inside the project.
dockerfile.cpu
and dockerfile.gpu
are two files used for actual deployment,
and dockerfile.cpu.test
and dockerfile.gpu.test
are used for internal development.
This document will focus on the usage of dockerfile.cpu
and dockerfile.gpu
.
For the dockerfiles for development, the readers can check the Github Wiki for more details.
Installing Docker Image¶
The docker images can be built from scratch or be obtained from Docker Hub.
Downloading Docker Image From Docker Hub¶
Download docker image with CUDA support:
sudo docker pull bluefoglib/bluefog:gpu-0.2.2
Download docker image with only CPU support:
sudo docker pull bluefoglib/bluefog:cpu-0.2.2
Building Your Own Docker Image¶
Build docker image with CUDA support:
sudo docker build -t bluefog_gpu . -f dockerfile.gpu
Build docker image with only CPU support:
sudo docker build -t bluefog_cpu . -f dockerfile.cpu
Running Docker Container¶
Here we used the docker images built from scratch as examples. Please make sure you used the correct docker image name in the following commands, if you download the docker image from Docker Hub.
Run docker container with CUDA support:
sudo docker run --privileged -it --gpus all --name bluefog_gpu_deploy --shm-size=64g --network=host -v /mnt/share/ssh:/root/.ssh bluefog_gpu:latest
Run docker container with only CPU support:
sudo docker run --privileged -it --name bluefog_cpu_deploy --network=host -v /mnt/share/ssh:/root/.ssh bluefog_cpu:latest
Clean up docker system after running:
sudo docker system prune
Nvidia Container Runtime¶
The following error may pop up when running a docker container with GPUs.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
In order to properly run docker with GPUs, Nvidia container runtime needs to be installed using following commands for Ubuntu. Furthermore, the GPU driver is also required.
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
sudo service docker restart
More details can be found on https://github.com/NVIDIA/nvidia-container-runtime and https://nvidia.github.io/nvidia-container-runtime.
Running Examples in Docker Containers¶
The docker images have already included a few examples for the Bluefog library and some unittests for users.
UnitTest in docker container
./run_unittest.sh
Examples in docker container
bfrun -np 4 python examples/pytorch_mnist.py