# Geometric Deep Learning

(Introduction) & Applications

###
Gudmundur Einarsson

Technical University of Denmark

### July 10th 2018

## Who am I

- From Iceland, live in Denmark
- Postdoc at DTU Compute focusing on geometric deep learning (GDL)
- Image Analysis & Computer Graphics Group

- PhD in Applied Mathematics defended in April 2018
- Focus on Statistical Learning, in particular sparse classification

- During the summer at deCODE genetics here in Iceland

## Collaborators in GDL

- Rasmsus R. Paulsen
- Line K. H. Clemmensen
- Others in reading group (
**geo-dl.compute.dtu.dk**)

## Prediction of Facial Landmarks

- We are interested in modeling the external anatomy of face and ear
- Used for accoustic simulations for optimal placements of microphones for hearing aids
- Accurate landmarks for phenotyping (what is a phenotype?)
- GDL caught our attention

## Outline

- What is
*geometric* about GDL?
- Why do we need GDL?
- Different Data, Different Problems and Different Approaches
- Preliminary Results for landmark prediction on faces
- Deep Learning and Genetics

## What does geometric mean?

## What do we refer to as geometric?

- Geometry is associated with the data, compared to images, which provide a single view and can be regarded as a euclidean grid
- Different input data
- 3D-point clouds, e.g. from 3D scans
- Volumetric data, Discritization of meshes or medical scans
- Meshes, e.g. Computer Aided Design drawings or Computer Graphics objects

- Usually surfaces embedded in 3D
- Non-euclidean geometries, how to calculate distances?
- Other graph structures, e.g. social networks or epidemiological networks.

## 3D point clouds

*Stanford Bunny Point Cloud*
## Volumetric data

*Stanford Bunny Volumetric*
## Meshes

*Stanford Bunny Mesh*
## What are the applications?

- Classification
- Big catalogues of CAD models

- Segmentation
- Semantic segmentation and scene segmentation

- Dense & Sparse point correspondances
- Landmark annotations
- Shape analysis

## Why do we need DL on this?

- Better and faster methods for e.g. classification
- Challenges in generalization to input for DL methods
- We want improvements similar to image and text based problems
- Meet the increase in acquisition of 3D data
- Scanning of museaum artifacts
- 3D-face and body scans
- Archeology scans
- Quality assurance in factories, real time scanning is on the way

## Structured Light Scanning

*Scanning of Polar Bear Skulls*
## Different Approaches for Different Data

## Different Approaches for Different Data

- Different represenations of data call for different approaches
- Different problems to tackle!

- We need invariances to
**different properties** of the data, e.g.:
- Point Clouds should be invariant to
**permutations** of the individual points
- Volumetric data should be invariant of
**orientation**
- Meshes should be invariant to changes in
**triangulation** of faces

## Point Clouds Example Approaches

## Point Clouds Example Approaches

- PointNet and PointNet++ from Stanford
- Implemented for 3D classification and segmentation
- Each point is treated independently as a 3d point in the input

## PointNet Requirements

**Unordered**, invariant to the *N!* permutations of the input
**Interaction Among Points**, need to capture combinatorial interactions in local structures
**Invariance Under Transformations**, rotation and translation should not affect our predictions/classifications

## Strategies for learning with unordered data

**Sort** input in canonical order
- No ordering is stable in high dimensional space for point perturbations

- Treat input as a
**sequence** for an RNN and permute training data
- Hard to scale for long sequences, works for
*N=10-100*, point clouds usually have at least 1000 points (usually way more)

- Simple
**symmetric functions** for information aggregation

## PointNet Idea

- Approximate a general function on a pointset by applying a symmetric function on transformed elements in the set \[
f(\{ x_1,...,x_n \}) \approx g(h(x_1),...,h(x_n))
\]
- \(f\) takes in a set, so it is invariant to permutations
- \(h\) is a multi-layer percepteron
- \(g\) is a composition of a single variable function and max pooling
- With several differnt \(h\) functions we can create a global descriptor of the point cloud
- Global descriptors are used for classification

## Other Details for PointNet

- Affine transformation is predicted for canonical alignment
- Also applied to features deeper
- Semantic segmentation
- Feed global descriptors to points
- Combine local and global features for point classification

- Theoretical justification for
**universal approximation** to continuous set function
- Trained on
*ModelNet40*, 12k man-made CAD models from 40 categories

## PointNet Architecture

*PointNet Architecture*
## PointNet Results Kinect Left, CAD Right

Results from *PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation*
## Volumetric and Multi-View Example Approaches

Results from *Volumetric and Multi-View CNNs for Object Classification on 3D Data*
## Main Ideas for Volumetric and Multi-View Approaches

- Auxiliary tasks
- Predict labels from subvolumes
- Helps to prevent overfitting

- Data augmentation
- Multi-Orientation Pooling
- Layer which aggregates information from many different views
- Also, multi-view pooling

## Changing Voxel Resolution for Rendering and Volumetric Approaches

Results from *Volumetric and Multi-View CNNs for Object Classification on 3D Data*
## Invariance to isometry

- For point correspondances between two meshes, e.g. two scanned humans
- We need an intrinsic operator which only depends on the Riemannian metric of the manifold.
- Bronstein et. al propose the Laplace Beltrami Operator (LBO)
- LBO admits an eigendecomposition on a smooth compact manifold
- Generalization of Fourier series to non-Euclidean domains
- Allows for defining convolution on meshes

## Decomposition is global

Function, filter, same filter different mesh
## Solved With Loacalised Approaches

Different Localised Approaches
## Back to Landmark Annotations

## Our Pipeline

Two stacked hourglass networks
## Applications to MRI

Not restricted to 3D scans of faces
## Applications for Genetics (if time allows)

- Data-driven phenotypes (e.g. Big Five)
- Phenotypes from images
- Modern social phenotypes

## Leaders in the Field

These individuals have paved the way in collaboration with their research groups.

- Professor Leonidas Guibas, Stanford
- Innovated approaches to volumetric and point-cloud data

- Professor Michael Bronstein, University of Lugano & Intel perceptual computing
- Convolution on meshes and dense mesh correspondances

## Material for those interested

- Geometric Deep Learning SIGGRAPH ASIA 2016 course notes
- Geometric Deep Learning Webpage
- geo-dl.compute.dtu.dk, our reading group

## Thanks Questions?

- Let me know if you come to Denmark!