Horovod is an open source software framework, designed for processing fast and efficient distributed deep learning models using TensorFlow, Keras, PyTorch, and Apache MXNet. It can scale up a single-GPU training script to run on multiple GPUs or hosts with minimal code changes.This instructor-led, live training (online or onsite) is aimed at developers or data scientists who wish to use Horovod to run distributed deep learning trainings and scale it up to run across multiple GPUs in parallel.By the end of this training, participants will be able to:
Set up the necessary development environment to start running deep learning trainings.
Install and configure Horovod to train models with TensorFlow, Keras, PyTorch, and Apache MXNet.
Scale deep learning training with Horovod to run on multiple GPUs.
Format of the Course
Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.
Course Customization Options
This course is focused on Horovod, but other software tools and frameworks such as TensorFlow, Keras, PyTorch, and Apache MXNet may be required. Please let us know if you have specific requirements or preferences.
To request a customized training for this course, please contact us to arrange.