DCGAN Tutorial
If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.
In this tutorial, we will port the DCGAN model to DeepSpeed using custom (user-defined) optimizers and a multi-engine setup!
Running Original DCGAN
Please go through the original tutorial for the Celebrities dataset first using the original code. Then run bash gan_baseline_run.sh
.
Enabling DeepSpeed
The codes may be obtained here.
Argument Parsing
The first step to apply DeepSpeed is adding configuration arguments to DCGAN model, using the deepspeed.add_config_arguments()
function as below.
import deepspeed
def main():
parser = get_argument_parser()
parser = deepspeed.add_config_arguments(parser)
args = parser.parse_args()
train(args)
Initialization
We use deepspeed.initialize
to create two model engines (one for the discriminator network and one for the generator network along with their respective optimizers) as follows:
model_engineD, optimizerD, _, _ = deepspeed.initialize(args=args, model=netD, model_parameters=netD.parameters(), optimizer=optimizerD)
model_engineG, optimizerG, _, _ = deepspeed.initialize(args=args, model=netG, model_parameters=netG.parameters(), optimizer=optimizerG)
Note that DeepSpeed automatically takes care of the distributed training aspect, so we set ngpu=0 to disable the default data parallel mode of pytorch.
Discriminator Training
We modify the backward for discriminator as follows:
model_engineD.backward(errD_real)
model_engineD.backward(errD_fake)
which leads to the inclusion of the gradients due to both real and fake mini-batches in the optimizer update.
Generator Training
We modify the backward for generator as follows:
model_engineG.backward(errG)
Note: In the case where we use gradient accumulation, backward on the generator would result in accumulation of gradients on the discriminator, due to the tensor dependencies as a result of errG
being computed from a forward pass through the discriminator; so please set requires_grad=False
for the netD
parameters before doing the generator backward.
Configuration
The next step to use DeepSpeed is to create a configuration JSON file (gan_deepspeed_config.json). This file provides DeepSpeed specific parameters defined by the user, e.g., batch size, optimizer, scheduler and other parameters.
{
"train_batch_size" : 64,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.0002,
"betas": [
0.5,
0.999
],
"eps": 1e-8
}
},
"steps_per_print" : 10
}
Run DCGAN Model with DeepSpeed Enabled
To start training the DCGAN model with DeepSpeed, we execute the following command which will use all detected GPUs by default.
deepspeed gan_deepspeed_train.py --dataset celeba --cuda --deepspeed_config gan_deepspeed_config.json --tensorboard_path './runs/deepspeed'
Performance Comparison
We use a total batch size of 64 and perform the training on 16 GPUs for 1 epoch on a DGX-2 node which leads to 3x speed-up. The summary of the results is given below:
-
Baseline total wall clock time for 1 epochs is 393 secs
-
Deepspeed total wall clock time for 1 epochs is 128 secs
###