Geometric Deep Learning
(Introduction) & Applications

Gudmundur Einarsson
Technical University of Denmark

July 10th 2018

Who am I

  • From Iceland, live in Denmark
  • Postdoc at DTU Compute focusing on geometric deep learning (GDL)
    • Image Analysis & Computer Graphics Group
  • PhD in Applied Mathematics defended in April 2018
    • Focus on Statistical Learning, in particular sparse classification
  • During the summer at deCODE genetics here in Iceland

Collaborators in GDL

  • Rasmsus R. Paulsen
    • PhD supervisor
  • Line K. H. Clemmensen
    • PhD co-supervisor
  • Others in reading group (

Prediction of Facial Landmarks

  • We are interested in modeling the external anatomy of face and ear
  • Used for accoustic simulations for optimal placements of microphones for hearing aids
  • Accurate landmarks for phenotyping (what is a phenotype?)
  • GDL caught our attention


  • What is geometric about GDL?
  • Why do we need GDL?
  • Different Data, Different Problems and Different Approaches
  • Preliminary Results for landmark prediction on faces
  • Deep Learning and Genetics

What does geometric mean?

What do we refer to as geometric?

  • Geometry is associated with the data, compared to images, which provide a single view and can be regarded as a euclidean grid
  • Different input data
    • 3D-point clouds, e.g. from 3D scans
    • Volumetric data, Discritization of meshes or medical scans
    • Meshes, e.g. Computer Aided Design drawings or Computer Graphics objects
  • Usually surfaces embedded in 3D
    • Non-euclidean geometries, how to calculate distances?
    • Other graph structures, e.g. social networks or epidemiological networks.

3D point clouds

Stanford Bunny Point Cloud

Volumetric data

Stanford Bunny Volumetric


Stanford Bunny Mesh

What are the applications?

  • Classification
    • Big catalogues of CAD models
  • Segmentation
    • Semantic segmentation and scene segmentation
  • Dense & Sparse point correspondances
    • Landmark annotations
    • Shape analysis

Why do we need DL on this?

  • Better and faster methods for e.g. classification
  • Challenges in generalization to input for DL methods
  • We want improvements similar to image and text based problems
  • Meet the increase in acquisition of 3D data
    • Scanning of museaum artifacts
    • 3D-face and body scans
    • Archeology scans
    • Quality assurance in factories, real time scanning is on the way

Structured Light Scanning

Scanning of Polar Bear Skulls

Different Approaches for Different Data

Different Approaches for Different Data

  • Different represenations of data call for different approaches
    • Different problems to tackle!
  • We need invariances to different properties of the data, e.g.:
    • Point Clouds should be invariant to permutations of the individual points
    • Volumetric data should be invariant of orientation
    • Meshes should be invariant to changes in triangulation of faces

Point Clouds Example Approaches

Point Clouds Example Approaches

  • PointNet and PointNet++ from Stanford
  • Implemented for 3D classification and segmentation
  • Each point is treated independently as a 3d point in the input

PointNet Requirements

  • Unordered, invariant to the N! permutations of the input
  • Interaction Among Points, need to capture combinatorial interactions in local structures
  • Invariance Under Transformations, rotation and translation should not affect our predictions/classifications

Strategies for learning with unordered data

  • Sort input in canonical order
    • No ordering is stable in high dimensional space for point perturbations
  • Treat input as a sequence for an RNN and permute training data
    • Hard to scale for long sequences, works for N=10-100, point clouds usually have at least 1000 points (usually way more)
  • Simple symmetric functions for information aggregation
    • Authors choose this!

PointNet Idea

  • Approximate a general function on a pointset by applying a symmetric function on transformed elements in the set \[ f(\{ x_1,...,x_n \}) \approx g(h(x_1),...,h(x_n)) \]
  • \(f\) takes in a set, so it is invariant to permutations
  • \(h\) is a multi-layer percepteron
  • \(g\) is a composition of a single variable function and max pooling
  • With several differnt \(h\) functions we can create a global descriptor of the point cloud
  • Global descriptors are used for classification

Other Details for PointNet

  • Affine transformation is predicted for canonical alignment
  • Also applied to features deeper
  • Semantic segmentation
    • Feed global descriptors to points
    • Combine local and global features for point classification
  • Theoretical justification for universal approximation to continuous set function
  • Trained on ModelNet40, 12k man-made CAD models from 40 categories

PointNet Architecture

PointNet Architecture

PointNet Results Kinect Left, CAD Right

Results from PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Volumetric and Multi-View Example Approaches

Results from Volumetric and Multi-View CNNs for Object Classification on 3D Data

Main Ideas for Volumetric and Multi-View Approaches

  • Auxiliary tasks
    • Predict labels from subvolumes
    • Helps to prevent overfitting
  • Data augmentation
  • Multi-Orientation Pooling
    • Layer which aggregates information from many different views
    • Also, multi-view pooling

Changing Voxel Resolution for Rendering and Volumetric Approaches

Results from Volumetric and Multi-View CNNs for Object Classification on 3D Data

Approaches for Meshes

Invariance to isometry

  • For point correspondances between two meshes, e.g. two scanned humans
  • We need an intrinsic operator which only depends on the Riemannian metric of the manifold.
  • Bronstein et. al propose the Laplace Beltrami Operator (LBO)
    • LBO admits an eigendecomposition on a smooth compact manifold
    • Generalization of Fourier series to non-Euclidean domains
    • Allows for defining convolution on meshes

Decomposition is global

Function, filter, same filter different mesh

Solved With Loacalised Approaches

Different Localised Approaches

Back to Landmark Annotations

Our Pipeline

Two stacked hourglass networks


Performance on different landmarks

Applications to MRI

Not restricted to 3D scans of faces


Applications for Genetics (if time allows)

  • Data-driven phenotypes (e.g. Big Five)
  • Phenotypes from images
  • Modern social phenotypes

Leaders in the Field

These individuals have paved the way in collaboration with their research groups.

  • Professor Leonidas Guibas, Stanford
    • Innovated approaches to volumetric and point-cloud data
  • Professor Michael Bronstein, University of Lugano & Intel perceptual computing
    • Convolution on meshes and dense mesh correspondances

Material for those interested

  • Geometric Deep Learning SIGGRAPH ASIA 2016 course notes
  • Geometric Deep Learning Webpage
  •, our reading group

Thanks Questions?

  • Let me know if you come to Denmark!