Skip to content
Go back

Geodesic Calculus on Latent Space

Introduction

In “Geodesic Calculus on Latent Spaces” by Hartwig et al., they propose a practical Riemannian toolkit for deep generative models: learn a projection onto the data manifold that lives inside a model’s latent space, and then use it to compute geodesics, exponential maps, and related operators in a numerically robust way.

Latent spaces of autoencoders and VAEs are not flat Euclidean clouds; around the data they behave like curved manifolds. Linear interpolation cuts through “holes” (low-density regions), often producing blurry or invalid samples. The paper shows how to build a reliable geometric calculus inside latent space, so we can move along the manifold: shortest paths (geodesics), shooting via exponential maps, etc., without requiring perfectly accurate decoders or explicit charts. This extends a growing line of work on the Riemannian view of deep generative models (The Riemannian Geometry of Deep Generative Models, Shao et al.)

The core idea is to treat the true data manifold as an implicit submanifold of the ambient latent space and learn a projection Π\Pi onto it (via denoising); then perform discrete Riemannian calculus by repeatedly stepping in latent space and projecting back onto the manifold.

Mathematical essentials

This “project–optimize–project” pattern does not require a perfect decoder metric or explicit tangent frames, only a good projection.

What the experiments show

Across synthetic and real datasets (with different autoencoders), the learned Π\Pi yields:

How it compares

Where to use it

A minimalist recipe

  1. Train an encoder-decoder (VAE/AE) on your data.
  2. Collect latent codes z=E(x)z=E(x) and train a small MLP Π\Pi with the denoising loss above; optionally add fixed-point/idempotence penalties.
  3. Geodesic between (za,zb)(z_a,z_b): initialize γ\gamma linearly; iterate (i) gradient descent on discrete energy E(γ)E(\gamma) in ZZ, (ii) project internal nodes with Π\Pi; clamp endpoints.
  4. Decode the path to visualize a geometry-aware interpolation.

Experiments

The experiments conducted in this notebook translate the theoretical framework of the paper into a concrete implementation on the MNIST dataset.

A VAE was first trained to learn a structured 16-dimensional latent representation of handwritten digits, where proximity between points reflects visual similarity. On top of this learned manifold, a projection network Π\Pi was trained as a denoising operator that maps any noisy or off-manifold latent code back toward the manifold of valid encodings, effectively approximating the Riemannian projection operator described in the paper.

Using Π\Pi, we computed discrete geodesic interpolations between latent representations of two digits (for example, “2 → 9”) by iteratively optimizing a path to minimize its energy while keeping each intermediate point projected onto the manifold. The resulting geodesic paths produced smoother, semantically coherent digit transitions than conventional linear interpolations.

interpolations

energy

Finally, PCA visualizations of the latent space revealed that the learned geodesic trajectories bend through dense data regions, faithfully following the manifold’s intrinsic geometry rather than cutting across low-density zones. These results empirically confirm the paper’s core idea: that a learned projection operator can endow deep latent spaces with a consistent geometric structure, enabling manifold-aware interpolation and analysis.

projection


Share this post on:

Previous Post
FMMI: Estimating Mutual Information with Flow Matching
Next Post
Less is More: Recursive Reasoning with Tiny Networks