Geometric Deep Learning
(Introduction) & Applications

Gudmundur Einarsson
Technical University of Denmark

July 10th 2018

Who am I

From Iceland, live in Denmark
Postdoc at DTU Compute focusing on geometric deep learning (GDL)
- Image Analysis & Computer Graphics Group
PhD in Applied Mathematics defended in April 2018
- Focus on Statistical Learning, in particular sparse classification
During the summer at deCODE genetics here in Iceland

Collaborators in GDL

Rasmsus R. Paulsen
- PhD supervisor
Line K. H. Clemmensen
- PhD co-supervisor
Others in reading group (geo-dl.compute.dtu.dk)

Prediction of Facial Landmarks

We are interested in modeling the external anatomy of face and ear
Used for accoustic simulations for optimal placements of microphones for hearing aids
Accurate landmarks for phenotyping (what is a phenotype?)
GDL caught our attention

Outline

What is geometric about GDL?
Why do we need GDL?
Different Data, Different Problems and Different Approaches
Preliminary Results for landmark prediction on faces
Deep Learning and Genetics

What does geometric mean?

What do we refer to as geometric?

Geometry is associated with the data, compared to images, which provide a single view and can be regarded as a euclidean grid
Different input data
- 3D-point clouds, e.g. from 3D scans
- Volumetric data, Discritization of meshes or medical scans
- Meshes, e.g. Computer Aided Design drawings or Computer Graphics objects
Usually surfaces embedded in 3D
- Non-euclidean geometries, how to calculate distances?
- Other graph structures, e.g. social networks or epidemiological networks.

3D point clouds

Stanford Bunny Point Cloud

Volumetric data

Stanford Bunny Volumetric

Meshes

Stanford Bunny Mesh

What are the applications?

Classification
- Big catalogues of CAD models
Segmentation
- Semantic segmentation and scene segmentation
Dense & Sparse point correspondances
- Landmark annotations
- Shape analysis

Why do we need DL on this?

Better and faster methods for e.g. classification
Challenges in generalization to input for DL methods
We want improvements similar to image and text based problems
Meet the increase in acquisition of 3D data
- Scanning of museaum artifacts
- 3D-face and body scans
- Archeology scans
- Quality assurance in factories, real time scanning is on the way

Structured Light Scanning

Scanning of Polar Bear Skulls

Different Approaches for Different Data

Different represenations of data call for different approaches
- Different problems to tackle!
We need invariances to different properties of the data, e.g.:
- Point Clouds should be invariant to permutations of the individual points
- Volumetric data should be invariant of orientation
- Meshes should be invariant to changes in triangulation of faces

Point Clouds Example Approaches

PointNet and PointNet++ from Stanford
Implemented for 3D classification and segmentation
Each point is treated independently as a 3d point in the input

PointNet Requirements

Unordered, invariant to the N! permutations of the input
Interaction Among Points, need to capture combinatorial interactions in local structures
Invariance Under Transformations, rotation and translation should not affect our predictions/classifications

Strategies for learning with unordered data

Sort input in canonical order
- No ordering is stable in high dimensional space for point perturbations
Treat input as a sequence for an RNN and permute training data
- Hard to scale for long sequences, works for N=10-100, point clouds usually have at least 1000 points (usually way more)
Simple symmetric functions for information aggregation
- Authors choose this!

PointNet Idea

Approximate a general function on a pointset by applying a symmetric function on transformed elements in the set \[ f(\{ x_1,...,x_n \}) \approx g(h(x_1),...,h(x_n)) \]
\(f\) takes in a set, so it is invariant to permutations
\(h\) is a multi-layer percepteron
\(g\) is a composition of a single variable function and max pooling
With several differnt \(h\) functions we can create a global descriptor of the point cloud
Global descriptors are used for classification

Other Details for PointNet

Affine transformation is predicted for canonical alignment
Also applied to features deeper
Semantic segmentation
- Feed global descriptors to points
- Combine local and global features for point classification
Theoretical justification for universal approximation to continuous set function
Trained on ModelNet40, 12k man-made CAD models from 40 categories

PointNet Architecture

PointNet Architecture

PointNet Results Kinect Left, CAD Right

Results from PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Volumetric and Multi-View Example Approaches

Results from Volumetric and Multi-View CNNs for Object Classification on 3D Data

Main Ideas for Volumetric and Multi-View Approaches

Auxiliary tasks
- Predict labels from subvolumes
- Helps to prevent overfitting
Data augmentation
Multi-Orientation Pooling
- Layer which aggregates information from many different views
- Also, multi-view pooling

Changing Voxel Resolution for Rendering and Volumetric Approaches

Results from Volumetric and Multi-View CNNs for Object Classification on 3D Data

Approaches for Meshes

Invariance to isometry

For point correspondances between two meshes, e.g. two scanned humans
We need an intrinsic operator which only depends on the Riemannian metric of the manifold.
Bronstein et. al propose the Laplace Beltrami Operator (LBO)
- LBO admits an eigendecomposition on a smooth compact manifold
- Generalization of Fourier series to non-Euclidean domains
- Allows for defining convolution on meshes

Decomposition is global

Function, filter, same filter different mesh

Solved With Loacalised Approaches

Different Localised Approaches

Back to Landmark Annotations

Our Pipeline

Two stacked hourglass networks

Performance

Performance on different landmarks

Applications to MRI

Not restricted to 3D scans of faces

Genetics

Applications for Genetics (if time allows)

Data-driven phenotypes (e.g. Big Five)
Phenotypes from images
Modern social phenotypes

Leaders in the Field

These individuals have paved the way in collaboration with their research groups.

Professor Leonidas Guibas, Stanford
- Innovated approaches to volumetric and point-cloud data
Professor Michael Bronstein, University of Lugano & Intel perceptual computing
- Convolution on meshes and dense mesh correspondances

Material for those interested

Geometric Deep Learning SIGGRAPH ASIA 2016 course notes
Geometric Deep Learning Webpage
geo-dl.compute.dtu.dk, our reading group

Thanks Questions?

Let me know if you come to Denmark!

Geometric Deep Learning (Introduction) & Applications

Gudmundur Einarsson Technical University of Denmark

July 10th 2018

Who am I

Collaborators in GDL

Prediction of Facial Landmarks

Outline

What does geometric mean?

What do we refer to as geometric?

3D point clouds

Volumetric data

Meshes

What are the applications?

Why do we need DL on this?

Structured Light Scanning

Different Approaches for Different Data

Different Approaches for Different Data

Point Clouds Example Approaches

Point Clouds Example Approaches

PointNet Requirements

Strategies for learning with unordered data

PointNet Idea

Other Details for PointNet

PointNet Architecture

PointNet Results Kinect Left, CAD Right

Volumetric and Multi-View Example Approaches

Main Ideas for Volumetric and Multi-View Approaches

Changing Voxel Resolution for Rendering and Volumetric Approaches

Approaches for Meshes

Invariance to isometry

Decomposition is global

Solved With Loacalised Approaches

Back to Landmark Annotations

Our Pipeline

Performance

Applications to MRI

Genetics

Applications for Genetics (if time allows)

Leaders in the Field

Material for those interested

Thanks Questions?

Geometric Deep Learning
(Introduction) & Applications

Gudmundur Einarsson
Technical University of Denmark