Geometric Deep Learning
Generative Adversarial Net

Gudmundur Einarsson
Technical University of Denmark

October 3rd 2018

GANs, a very hot topic!

Points I want to try to address

What is a GAN?
What are the challenges using GANs? E.g. learning
Examples of applications
Why is it interesting for us!
Check out iangoodfellow.com for nice presentations

What is the problem GANs solve?

GANs are a way to create very good generative models
They properly model the data manifold
Achieved by using an adversary
- Learns to discriminate between real and generated samples
- Pushes generator to become better

Let’s see the difference

Why the hype?

Named GANs

There are around 500 new GAN paper every month
Check out the GAN model zoo on github for inspiration
This is simply too much material to cover
Focus and stick to the original paper
If time, look at ganerated hands, as an example application

The GAN paper

Basic GAN idea
- Game between generator \(G\) and Discriminator \(D\)
- Both are generic, in original papers are MLPs
- This generic part is one of the reason for popularity
\(D\) is a differentiable function which discriminates real and generated data, has parameters \(\boldsymbol{\theta}^{(D)}\)
- \(\mathbf{x}\) is the input
- \(D(\mathbf{x})\) is the probability of sample being real
- Has cost function \(J^{(D)}(\boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)})\)
\(G\) is a differentiable function, has parameters \(\boldsymbol{\theta}^{(G)}\)
- \(\mathbf{z}\) sample from some prior
- \(G(\mathbf{z})\) gives an \(\mathbf{x}\) from \(p_{\text{model}}\)
- \(\mathbf{z}\) is very flexible, can inject noise in many places
- - Has cost function \(J^{(G)}(\boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)})\)

How do we train these?

Multiple different approaches
Simultaneous SGD
- Minibatch from of \(\mathbf{x}\) values from dataset
- Minibatch of \(\mathbf{z}\) value from model prior
Take simultaneous gradient steps
- update \(\boldsymbol{\theta}^{(D)}\) to reduce \(J^{(D)}\)
- update \(\boldsymbol{\theta}^{(G)}\) to reduce \(J^{(G)}\)
Can use any gradient base optimization, e.g. Adam

Cost function for \(D\)

\[ J^{(D)}(\boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)}) = -\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\log D(\mathbf{x}) -\frac{1}{2}\mathbb{E}_{\mathbf{z}}\log (1- D(G(\mathbf{z}))) \]

This is standard cross-entropy for binary classifier with a sigmoid output
Trained with two minibatches of data
- One from dataset, the ones
- One generated, the zeroes

Cost function for \(G\)

Here are more choices, simplest is the zero-zum game \[ J^{(G)} = -J^{(D)} \]
This allows us to summarize all into one value function \[ V\left( \boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)} \right) = -J^{(D)}\left( \boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)} \right) \]
This allows us solve a minimax proble, which is the solution to zero-sum games \[ \boldsymbol{\theta}^{(G)*} = \text{arg} \min_{\boldsymbol{\theta}^{(G)}}\max_{\boldsymbol{\theta}^{(D)}}V\left( \boldsymbol{\theta}^{(D)},\boldsymbol{\theta}^{(G)} \right) \]
Mostly interesting for theoretical analysis

Practical problems and other cost functions

If the discriminator becomes too good, then gradient for the generator vanishes. It gets stuck and we do not see any improvements.
Heuristic change, flip target to construct in stead of sign \[ J^{(G)} = -\frac{1}{2}\mathbb{E}_{\mathbf{z}}\log D(G(\mathbf{z})) \]
Generator now maximizes the probability of the discriminator being wrong, instead of minimizing the probability of the discriminator being correct.
Can no longer use one single value function

DCGANs

Radford et. al 2015
- Deep convolution GANs
Three new key insights
- Batch normalization, in both \(G\) and \(D\), normalize the two minibatches seperately
- Based on the all-convolutional net (Springenberg et. al 2015), for increasing spatial dimension we use transposed convolution with stride greater than 1.
- They use Adam instead of SGD with momentum.

Tips and tricks

Train with labels
- Make the discriminator recognize specific classes of objects
One-sided label smoothing
- Change the ones to \(0.9\)
Virtual batch normalization
- Use a reference batch, sample at beginning, and current batch to estimate normalization values.
- Otherwise the two different minibatches cause too much fluctuation in the normalization parameters.
In practice, \(D\) is usually deeper, more filters per layer

Example use case, CVPR 2018

Whole Pipeline

GeoConGan

Image to image translation network
Translate synthetic images to realistic looking images
We have ground truth annotations for synthetic data, they just don’t look real enough

Generating realistic hands

Geometric Consistency, important for consistent annotations

See in action

What do the authors release?

Implementation in caffe
Heatmaps are 128 by 128
Only release forward pass for tracking

Issues for GANs

GANs are not the holy grail of deep learning
They have some very hard problems to solve

Issues, how to evaluate performance?

Samples from LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS

Are these just look ups from the training data?

Are these just look-ups from the training data?

Future paper to look into

More examples where gans are used to synthesize training data
Cycle gans, maybe info GAN
Focus on more high resolution image generation

Geometric Deep Learning Generative Adversarial Net

Gudmundur Einarsson Technical University of Denmark

October 3rd 2018

GANs, a very hot topic!

Points I want to try to address

What is the problem GANs solve?

Let’s see the difference

Why the hype?

Named GANs

The GAN paper

How do we train these?

Cost function for \(D\)

Cost function for \(G\)

Practical problems and other cost functions

DCGANs

Tips and tricks

Example use case, CVPR 2018

Whole Pipeline

GeoConGan

Generating realistic hands

Geometric Consistency, important for consistent annotations

See in action

What do the authors release?

Issues for GANs

Issues, how to evaluate performance?

Are these just look ups from the training data?

Are these just look ups from the training data?

Are these just look-ups from the training data?

Future paper to look into

Geometric Deep Learning
Generative Adversarial Net

Gudmundur Einarsson
Technical University of Denmark