A geometric encoder's toolkit: deterministic primitives for hyperspherical image encoding
Abstract
The most promising path to a purely geometric image encoder combines wavelet scattering transforms (~82% CIFAR-10 accuracy with zero learned parameters) as the backbone signal decomposition, Hopf fibration for hierarchical sphere encoding, E₈/Leech lattice configurations for optimal anchor placement, and von Mises-Fisher mixtures for soft triangulation. This manifest catalogs every viable mathematical primitive — 35+ structures across six categories — rated by determinism, invertibility, complexity, and composability. The central finding: a multi-stage pipeline of scattering → spherical harmonics → JL projection → L2 normalization onto S^d can produce embeddings rich enough for constellation-based classification, with each stage adding orthogonal geometric information. The ~20% accuracy gap between deterministic features and learned encoders on CIFAR-10 reflects non-geometric intra-class variabilities that no predetermined transform can capture — but deterministic features excel as front-ends that dramatically reduce needed training data.
Claude Created Research Article
MUCH of this is likely partly incorrect, or Claude has only a vague idea how to actually code it. I'll need to run each test manually multiple times.
This means multiple checks, tests, and validations to the documentation to ensure the structure correctly aligns to the necessary specifications.
NORMALLY I drive my own research direction, but today I've given a bit of power to Claude to see if the research direction can meet a meaningful direction to REDUCE complexity and improve performance rather than increase complexity.
Manifold structures that encode hierarchy on hyperspheres
Hopf fibration: the natural hierarchical decomposition of S³
The Hopf map h: S³ → S² sends unit quaternion q = (a, b, c, d) to:
h(a,b,c,d) = (a²+b²−c²−d², 2(ad+bc), 2(bd−ac))
In complex coordinates (z₁, z₂) ∈ ℂ² with |z₁|²+|z₂|²=1, the map is p(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²). Every point on the base S² has an entire S¹ fiber as its preimage, creating a natural coarse-to-fine decomposition: the base point encodes global direction while the fiber position encodes local phase. Only three higher Hopf fibrations exist (Adams's theorem): S⁷→S⁴ (quaternionic, with S³ fibers), S¹⁵→S⁸ (octonionic, with S⁷ fibers — limited by non-associativity), and the trivial S¹→S¹.
For a constellation system, Hopf fibration enables hierarchical triangulation: place anchors on S² (base), each defining a fiber circle on S³. Coarse classification identifies the nearest base-point anchor; fine discrimination localizes within the fiber. Miyamoto and Costa (2021) proved that Hopf foliations yield constructive spherical codes in ℝ^{2^k} with O(n log n) encoding/decoding complexity. The fibration is fully deterministic, O(1) per point, and while the forward map is not invertible (many-to-one), a chosen section reconstructs up to the fiber coordinate.
Clifford algebras: the algebraic engine for multi-channel geometry
The Clifford algebra Cl(p,q) is generated by basis vectors {e₁,...,eₙ} satisfying eᵢeⱼ + eⱼeᵢ = 2gᵢⱼ, producing a 2ⁿ-dimensional algebra graded by geometric degree. The geometric product uv = u·v + u∧v unifies inner products (scalar similarity) and outer products (oriented planes) in a single operation. Special cases include complex numbers Cl(0,1), quaternions Cl(0,2), and the conformal geometric algebra Cl(4,1) used in computer vision.
Every element of the algebra simultaneously encodes objects at all geometric grades — scalars (magnitudes), vectors (directions), bivectors (oriented planes/rotations), up to the pseudoscalar. Rotations act as sandwich products: v ↦ RvR⁻¹ where R is a rotor (even-grade element). This is fully deterministic and invertible (multivectors with nonzero norm have inverses A⁻¹ = Ã/|A|²). Computational complexity is O(2²ⁿ) naively but O(2ⁿ log 2ⁿ) with structured implementations. For constellation systems, anchor points can be represented as multivectors with distances measured via the geometric product; CliffordNet (Ji, 2025) demonstrated that the full geometric product (not just inner products) provides algebraic completeness for vision tasks.
Grassmannian, Stiefel, and flag manifolds: subspace geometry
Grassmannian Gr(k,n) is the manifold of k-dimensional subspaces of ℝⁿ, with dimension k(n−k). Points are represented by n×k orthonormal matrices X (modulo right O(k) action), or equivalently by projection matrices P = XXᵀ. Five standard distance metrics exist between subspaces via principal angles θ₁,...,θₖ (computed from SVD of XᵀY):
| Metric | Formula | Key property |
|---|---|---|
| Geodesic | d = √(Σθᵢ²) | True Riemannian distance |
| Chordal | d = √(Σsin²θᵢ) = ‖XXᵀ−YYᵀ‖_F/√2 | Euclidean embedding |
| Projection | d = sin(θ_max) | Operator norm |
| Fubini-Study | d = arccos(∏cosθᵢ) | Projective geometry |
| Binet-Cauchy | d = arccos(√(∏cos²θᵢ)) | Volume-based |
Grassmannian class representations (OpenReview 2023) replace single class vectors with class subspaces on Gr(k,n), improving ImageNet top-1 accuracy from 78.04% to 79.37%. All operations are O(nk²) via SVD.
The Stiefel manifold St(k,n) = {X ∈ ℝⁿˣᵏ | XᵀX = Iₖ} is the space of orthonormal k-frames, with Gr(k,n) = St(k,n)/O(k). Efficient optimization uses the Cayley retraction R_X(ξ) at O(nk²) per step, avoiding expensive matrix exponentials.
Flag manifolds FL(n₁,...,nₖ; n) encode nested subspace hierarchies V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ. They generalize Grassmannians and naturally represent multi-resolution structure. The flag trick (Szwagier & Pennec, 2025) converts Grassmannian optimization into flag optimization with guaranteed nestedness — directly relevant for multi-scale anchors in a constellation system where coarse and fine subspaces must be consistent.
Sphere packing and optimal anchor placement
For placing K constellation anchors on S^d with maximum angular separation, sphere packing theory provides provably optimal configurations:
| Dimension | Kissing number | Structure | Anchor count |
|---|---|---|---|
| 4 (S³) | 24 | 24-cell | 24 maximally separated |
| 8 (S⁷) | 240 | E₈ lattice | 240 universally optimal |
| 24 (S²³) | 196,560 | Leech lattice | 196,560 universally optimal |
The E₈ lattice in dimension 8 achieves provably optimal density (Viazovska 2016, Fields Medal) with 240 nearest neighbors at precisely equal distances. The Leech lattice in dimension 24 achieves 196,560 kissing vectors and is universally optimal (Cohn-Kumar 2007). These provide deterministic, information-theoretically optimal anchor initializations for constellation systems.
For arbitrary dimensions, spherical t-designs guarantee that the average of any polynomial of degree ≤t over the design points equals the sphere average, requiring N ≈ C·t^{d−1} points. Constructive methods include Hopf foliation codes (O(n log n) encoding in ℝ^{2^k}), linear programming bounds via Gegenbauer polynomials, and energy minimization (Thomson problem).
Von Mises-Fisher: the Gaussian of the hypersphere
The vMF distribution on S^{p−1} has density f(x; μ, κ) = Cₚ(κ) exp(κ μᵀx) where Cₚ(κ) = κ^{p/2−1}/((2π)^{p/2} I_{p/2−1}(κ)), μ is the mean direction, and κ ≥ 0 is the concentration parameter (κ=0 gives uniform, κ→∞ gives point mass). MLE for the mean direction is simply the normalized arithmetic mean: μ̂ = Σxᵢ/‖Σxᵢ‖.
A vMF mixture model p(x) = Σⱼ αⱼ f(x; μⱼ, κⱼ) provides the natural density estimator for a constellation system. Each anchor μⱼ with concentration κⱼ defines an influence region on the sphere. Given input x ∈ S^d, the posterior p(j|x) gives soft barycentric coordinates for triangulation — high-κ anchors capture fine detail (narrow cones) while low-κ anchors capture broad regions. The vMF mixture EM algorithm has O(NKp) complexity per iteration. Banerjee et al. (JMLR 2005) established that vMF mixtures are the natural model for L2-normalized high-dimensional data.
Signal decomposition: converting pixels to geometric features
Wavelet scattering transform: the deterministic encoder baseline
The scattering transform cascades wavelet convolutions with modulus nonlinearities:
S_J[p]x(u) = |||x ∗ ψ_{λ₁}| ∗ ψ_{λ₂}| ... | ∗ ψ_{λₘ}| ∗ φ_{2^J}(u)
where ψ_{j,θ} are complex Morlet wavelets at scale 2^j and orientation θ, φ is a low-pass averaging filter, and |·| is the complex modulus. Orders 0-2 suffice (energy decays exponentially; <1% at order 3). This is fully deterministic, translation-invariant up to scale 2^J, Lipschitz-continuous to deformations (‖S_J(L_τ x) − S_J(x)‖ ≤ C·‖∇τ‖_∞·‖x‖), and approximately energy-preserving (‖Sx‖² ≈ ‖x‖²).
Classification benchmarks without any learning:
| Dataset | Method | Accuracy |
|---|---|---|
| MNIST | Scattering + SVM | ~99.5% |
| CIFAR-10 | Scattering + SVM | ~82-83% |
| CIFAR-10 | Scattering + linear | ~70.5% |
| ImageNet | Scattering only | ~45% top-5 |
| CIFAR-10 | Scattering + WRN hybrid | 93.76% |
The scattering transform is the clear leader among deterministic methods — 82% on CIFAR-10 versus ~62% for HOG+SVM or ~38% for raw pixels+SVM. Approximately invertible (Anglès & Mallat, 2018); not exactly so due to phase loss from the modulus. Complexity is O(N·J·L·M) per image. The Kymatio library (JMLR 2020) provides GPU-accelerated implementations in PyTorch/TensorFlow/JAX.
https://github.com/kymatio/kymatio
For hypersphere interface: L2-normalization maps scattering vectors to S^d as v̂ = Sx/‖Sx‖₂. Since ‖Sx‖² ≈ ‖x‖², the normalization is geometrically meaningful — angular distances on S^d capture shape/texture similarity independent of amplitude.
Gabor filter banks: sampling the rotation group
A Gabor function g(x,y; λ,θ,ψ,σ,γ) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ)) parameterizes oriented, frequency-tuned detectors. A filter bank of S scales × K orientations samples a discrete subset of the scale-orientation space ℝ⁺ × SO(2). Each orientation θₖ ∈ [0,π) is a point on the circle S¹ ≅ SO(2), making the filter bank a discrete sampling of the similitude group. Fully deterministic, O(N·S·K) complexity, and first-order scattering coefficients are approximately equal to modulus of Gabor responses — making Gabor filters a natural first stage feeding into deeper scattering cascades.
Radon transform: projection geometry meets Grassmannians
The Radon transform Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx computes line integrals parameterized by direction ω ∈ S^{n−1} and displacement t ∈ ℝ. Each direction defines a point in Gr(1,n) — the Grassmannian of lines through the origin. The Fourier slice theorem F̂₁DRf(θ,·) = F̂₂D[f](ξ·cosθ, ξ·sinθ) links projection geometry to frequency analysis. Fully deterministic, exactly invertible via filtered back-projection, O(N^{3/2}) complexity. The ridgelet transform equals a 1D wavelet transform composed with the Radon transform, and first-generation curvelets are local ridgelet transforms — forming a natural composition hierarchy: Radon → Ridgelet → Curvelet.
Persistent homology: topological invariants as geometric signatures
Given a filtration of simplicial complexes (e.g., sublevel sets at increasing thresholds), persistent homology tracks birth and death of topological features: β₀ (connected components), β₁ (loops), β₂ (voids). Persistence diagrams {(bᵢ, dᵢ)} encode feature lifetimes; vectorization via persistence images or persistence landscapes produces fixed-size feature vectors suitable for ML.
Fully deterministic, O(n³) worst-case (near-linear in practice with Ripser). Not invertible — topological features discard geometric detail. Excels in low-data regimes: 95.95% crack detection accuracy with only 1 labeled sample per class (Di Via et al., 2024); 93.11% for malaria diagnosis. On standard benchmarks with full data, TDA alone lags behind learned methods but provides complementary topological information that no other transform captures. Persistence images can be L2-normalized to S^d and composed with any preceding transform — applying TDA to scattering features captures the topology of the representation manifold.
Curvelet and ridgelet transforms: directional frequency analysis
Curvelets decompose images into multi-scale, multi-directional elements obeying the parabolic scaling law: width ≈ length². This achieves optimal O(K⁻²(log K)³) approximation rate for piecewise C² images with K coefficients — provably better than wavelets' O(K⁻¹). Curvelet coefficients parameterize a space indexed by (scale, angle, position) ∈ ℝ⁺ × S¹ × ℝ². Ridgelets are constant along lines, making them optimal for linear singularities. Both are fully deterministic, exactly invertible (tight frames), and O(N² log N) via fast discrete transforms. Neither has well-established standalone classification benchmarks, but their directional sensitivity complements scattering transforms which capture scale but not fine orientation structure.
Compact geometric representations preserving structure
Random Fourier features: kernel approximation landing on hyperspheres
Rahimi & Recht's (2007) feature map z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)] approximates shift-invariant kernels via Bochner's theorem, where ωᵢ are drawn from the kernel's spectral distribution and bᵢ ~ Uniform[0,2π]. The critical geometric insight: each (cos(ωᵢᵀx), sin(ωᵢᵀx)) pair lives on a unit circle, so the concatenated vector naturally inhabits a product of circles — and when normalized, a hypersphere. Tancik et al. (NeurIPS 2020) proved this explicitly: the Fourier feature mapping γ(v) = [cos(2πBv), sin(2πBv)]ᵀ maps inputs to a higher-dimensional hypersphere.
Pseudo-deterministic (random ωᵢ drawn once and fixed). Not directly invertible but preserves inner product structure: ⟨z(x),z(y)⟩ ≈ k(x−y) with uniform convergence. Complexity O(Dd) per point; optimized variants (Fastfood, SORF) achieve O(D log d). Requires D = O(dε⁻² log(1/ε)) features for ε-approximation. Highly composable — can be applied after scattering transforms for nonlinear kernel approximation on scattering coefficients.
Johnson-Lindenstrauss: the universal dimension reducer
For n points in ℝ^d, a random linear map f(x) = (1/√k)Ax with A having i.i.d. Gaussian/Rademacher entries preserves all pairwise distances to (1±ε) with k = O(ε⁻² log n) dimensions — independent of d. This is the canonical result for geometric dimensionality reduction. After projection, renormalization maps points onto S^{k−1} with angular distortion bounded by O(ε).
Pseudo-deterministic (random A drawn once, then fixed). Not exactly invertible but pseudo-inverse provides approximate reconstruction. The tight lower bound k ≥ Ω(ε⁻² log n) is achieved (Larsen-Nelson 2017). Fast JL via randomized Hadamard achieves O(d log d + k²) per point. Extremely composable: JL ∘ RFF, JL ∘ scattering, cascaded JL₁ ∘ JL₂ are all valid. The natural role in a geometric encoder pipeline is dimensionality reduction after high-dimensional feature extraction.
Compressed sensing: exploiting sparsity for geometric recovery
If a signal x ∈ ℝ^N is k-sparse, m = O(k log(N/k)) random measurements y = Φx suffice for exact recovery via ℓ₁ minimization, provided Φ satisfies the Restricted Isometry Property: (1−δₖ)‖x‖₂² ≤ ‖Φx‖₂² ≤ (1+δₖ)‖x‖₂² for all k-sparse x. Recovery is fully invertible for sparse signals and robust to noise. The key insight for geometric encoding: if scattering coefficients are approximately sparse in some basis, compressed sensing enables dramatically more compact representations than JL alone (O(k log(N/k)) vs O(ε⁻² log n)).
Cayley-Menger determinants: encoding simplicial geometry from distances
The volume of an n-simplex from pairwise distances is encoded by the bordered distance matrix:
(-1)^{n+1} · 2ⁿ · (n!)² · Vₙ² = det(CM) where CM is the (n+2)×(n+2) matrix with CM₀₀=0, CM₀ᵢ=CMᵢ₀=1, CMᵢⱼ=dᵢⱼ².
This yields Heron's formula for triangles and generalizes to arbitrary dimensions. Fully deterministic, O(n³) for the determinant. For constellation systems, CM determinants encode the complete geometric structure of anchor configurations from pairwise distances — providing a single scalar invariant (simplex volume) that characterizes any subset of anchors. Menger's theorem gives necessary and sufficient conditions for a distance matrix to embed in ℝⁿ, and the closely related result of Gödel (1933) characterizes when points lie on a sphere.
Spherical harmonics: native Fourier analysis on spheres
The spherical harmonics Y_l^m(θ,φ) form a complete orthonormal basis for L² functions on S², with degree l capturing angular frequency l and (2l+1) harmonics per degree. Any function f on S² decomposes as f = Σ_{l,m} f_l^m Y_l^m with coefficients f_l^m = ∫ f Y_l^{m*} dΩ. Truncation to degree l_max gives (l_max+1)² coefficients — 9 for l_max=2, 25 for l_max=4. Fully deterministic, theoretically perfectly invertible (approximately so under truncation), with fast transforms at O(l_max² log² l_max). Rotation of spherical harmonic coefficients uses Wigner D-matrices in O(l_max³). For higher-dimensional spheres S^{n−1}, hyperspherical harmonics generalize with dimensionality growing as O(l^{n−2}).
The spherical harmonics are the native frequency decomposition for hypersphere data — analogous to Fourier analysis on ℝⁿ. For a constellation system operating on S^d, expanding the embedding function in hyperspherical harmonics reveals which angular frequencies carry discriminative information.
Invertible geometric transforms bridging sphere and plane
Stereographic projection: the canonical sphere-plane bridge
Forward: σ(x₁,...,x_{n+1}) = (x₁,...,xₙ)/(1−x_{n+1}). Inverse: σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1). This is the unique conformal bijection between S^n{north pole} and ℝⁿ, preserving angles but not distances. Fully deterministic, perfectly invertible, O(n) for both directions. The conformal factor ds²_sphere = 4/(1+‖y‖²)² ds²_flat means that distances near the pole are compressed while distances near the equator are roughly preserved. For a geometric encoder, stereographic projection enables applying Euclidean methods (scattering transforms, wavelets) to data that naturally lives on spheres, or projecting Euclidean features onto spheres.
Exponential and logarithmic maps: geodesic linearization
On S^n, the exponential map has the closed form exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖ and its inverse log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖. These provide a local linearization of the manifold: the tangent space T_pS^n is a flat ℝⁿ that approximates S^n near p. For a constellation system, log maps centered at each anchor point provide local Euclidean coordinates — the "distance and direction" from each anchor to the input embedding. These are fully deterministic, locally invertible (within the injectivity radius π on S^n), and O(n).
Parallel transport: preserving geometry across tangent spaces
On S^n along the geodesic from p to q: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q). This moves tangent vectors between tangent spaces while preserving inner products — an isometric isomorphism between T_pM and T_qM. Essential for comparing local geometric features computed at different anchor points. After computing log maps at different anchors, parallel transport enables coherent aggregation of these local representations. Fully deterministic, exactly invertible (transport along reversed curve), O(n) on spheres.
Möbius transformations: conformal automorphisms of S^d
h_ω(z) = (1−‖ω‖²)/‖z−ω‖² − ω for ω inside the unit ball and z ∈ S^D. These are the angle-preserving self-maps of the sphere, forming the group SO(n+1,1) via Ahlfors-Vahlen matrices with Clifford algebra entries. They expand the region near ω and contract the rest — providing a "geometric attention" mechanism. Fully deterministic given parameters, exactly invertible (the group has explicit inverses), O(D) per evaluation. Rezende et al. (ICML 2020) used Möbius transformations as building blocks for normalizing flows on spheres, composing them with circular splines to model complex distributions.
Procrustes analysis and matrix decompositions as geometric features
Procrustes alignment finds the optimal rotation R = UV^T (from SVD of BᵀA) minimizing ‖RA−B‖_F. This enables aligning input feature configurations to canonical poses before encoding. Fully deterministic, exactly invertible (R⁻¹=Rᵀ), O(d³).
The matrix decompositions each separate distinct geometric properties:
| Decomposition | Formula | Geometric meaning | Deterministic? | Invertible? |
|---|---|---|---|---|
| SVD | A = UΣVᵀ | rotation → scale → rotation | Yes | Yes (pseudo-inverse for rank-deficient) |
| Polar | A = UP | pure rotation × pure stretch | Yes | Yes (if invertible) |
| QR | A = QR | rotation × (scale+shear) | Yes | Yes |
| Schur | A = QTQ* | unitary similarity → eigenstructure | Yes | Yes |
| NMF | V ≈ WH, W,H≥0 | parts-based (additive) | No (iterative) | Approximate |
| Tucker tensor | 𝒯 ≈ 𝒢 ×₁U₁ ×₂U₂ ×₃U₃ | mode subspaces + interactions | Partly (HOSVD) | Approximate |
| CP tensor | 𝒯 ≈ Σᵣ λᵣ aᵣ⊗bᵣ⊗cᵣ | rank-one directional components | Unique (Kruskal) | Approximate |
The polar decomposition A = UP is especially valuable: U ∈ O(n) lives on the orthogonal group (extracting pure rotation from image patches) while P encodes the symmetric stretch tensor. The SVD columns of U and V lie on Stiefel manifolds, naturally connecting matrix features to manifold geometry. Wang et al. (IEEE TPAMI 2021) solved the problem of differentiable SVD gradients, enabling end-to-end training through SVD layers.
Information-theoretic geometry: measuring what encoders preserve
Fisher-Rao metric: the unique invariant distance between distributions
The Fisher information metric gᵢⱼ(θ) = E[∂ᵢ log p(x|θ) · ∂ⱼ log p(x|θ)] is, by Chentsov's theorem, the unique (up to scaling) Riemannian metric on statistical manifolds invariant under sufficient statistics. The critical connection to hyperspheres: the square-root embedding p → √p maps probability distributions to the positive orthant of the unit sphere in L², where d_FR(p,q) = 2·arccos(∫√(pq) dx) — the Fisher-Rao distance equals twice the geodesic distance on the sphere (Hellinger geometry). For discrete distributions over n outcomes, the probability simplex maps isometrically to the positive orthant of S^{n−1}. This provides the information-theoretic foundation for why hyperspherical embeddings are natural: distributions over classes, when encoded via square-root embeddings, live on spheres with the Fisher-Rao metric as the natural distance.
Wasserstein distance on spheres
The Wasserstein-p distance W_p(μ,ν) = (inf_{γ∈Γ(μ,ν)} ∫ d(x,y)^p dγ)^{1/p} measures optimal transport cost. The spherical sliced-Wasserstein distance (Bonet et al., ICLR 2023) extends slicing to probability measures on spheres by projecting onto great circles, enabling efficient O(n log n) comparison of distributions on S^{d−1}. For a constellation system, Wasserstein distance between the distribution of embeddings around different anchor configurations measures the geometric cost of representation changes — providing a principled distance for comparing encoder outputs.
Rate-distortion bounds on geometric compression
Rate-distortion theory provides fundamental limits: for data on a d-dimensional manifold embedded in ℝⁿ, the minimum encoding rate for distortion D scales as R(D) ∝ (d/2)·log(1/D), depending on intrinsic dimension d rather than ambient dimension n. Recent work on β-VAEs (PLoS Computational Biology, 2025) identified three geometric distortions emerging from rate-distortion trade-offs: prototypization (representations collapse toward class prototypes), specialization (rare classes get less representation space), and orthogonalization (class representations pushed apart). These distortions are not bugs but features — they emerge naturally from optimal compression under classification constraints and directly inform constellation anchor behavior.
Mutual information on hyperspheres
Recent theoretical work (arXiv:2602.08105, 2025) proved that contrastive losses naturally produce approximately uniform and aligned embeddings on hyperspheres in sufficiently high-dimensional latent spaces. This provides theoretical justification for the entire geometric encoding paradigm: maximizing mutual information between inputs and representations, subject to capacity constraints, naturally produces hyperspherical embeddings. MINE (Belghazi et al., ICML 2018) and InfoNCE enable measuring I(X;Z) between input X and geometric representation Z, quantifying how much information each pipeline stage preserves.
Composing a multi-stage deterministic encoder pipeline
The 35+ primitives above compose into a practical pipeline. The key design principle: each stage should add orthogonal geometric information while projecting progressively onto S^d.
The reference pipeline architecture
Stage 1 — Multi-scale frequency decomposition (scattering transform). Apply the wavelet scattering transform to order 2 with J=4 scales and L=8 orientations. This produces a high-dimensional feature vector encoding translation-invariant, deformation-stable spectral structure. Output dimension: ~10,000 for 32×32 images. Complexity: O(N·J·L·2). This stage captures local frequency content and multi-scale edge structure.
Stage 2 — Directional enrichment (curvelet coefficients). Extract curvelet energy statistics per scale-orientation band, capturing anisotropic edge information that scattering misses. Concatenate with scattering features. This adds directional frequency structure not captured by isotropic wavelets.
Stage 3 — Topological signature (persistent homology). Compute persistence diagrams from sublevel set filtrations of the input (and optionally of the scattering feature maps). Vectorize via persistence images. This adds topological invariants (Betti numbers, persistence) orthogonal to all frequency-based features.
Stage 4 — Dimensionality reduction (JL projection). Apply a random Gaussian projection from the concatenated high-dimensional feature space to ℝ^k where k = O(ε⁻² log n), preserving all pairwise distances to (1±ε). For n=50,000 training points and ε=0.1, k ≈ 500-1000 suffices. Complexity: O(kd) or O(d log d) with fast JL.
Stage 5 — Spherical projection (L2 normalization). Normalize: v̂ = v/‖v‖₂ ∈ S^{k−1}. The energy preservation of scattering and distance preservation of JL ensure this normalization is geometrically meaningful.
Stage 6 — Constellation triangulation (vMF soft assignment). Compute posterior probabilities p(j|v̂) for each of K learned anchor points μⱼ with concentrations κⱼ on S^{k−1}. This produces a K-dimensional soft assignment vector — the "constellation coordinates" — that localizes the input relative to all anchors simultaneously. Anchor positions initialized from optimal spherical codes (E₈ lattice for dim 8, or numerically optimized configurations for other dimensions).
Composability graph of key primitives
The following compositions have been validated theoretically or empirically:
- Scattering → JL → S^d: Standard pipeline; scattering energy preservation + JL distance preservation = meaningful angular distances after normalization
- Scattering → RFF → S^d: Nonlinear kernel approximation on scattering coefficients; maps to hypersphere via cos/sin structure
- Radon → Ridgelet → Curvelet: Hierarchical composition via Fourier slice theorem
- Gabor → Scattering: First-order scattering ≈ Gabor modulus response; higher orders add inter-frequency structure
- Any transform → PH: Topology of feature space is always complementary information
- exp/log maps → parallel transport → Procrustes: Full toolkit for manipulating local coordinates on S^d around constellation anchors
- Hopf fibration → vMF mixture: Hierarchical triangulation with density estimation per fiber
What the ~20% CIFAR-10 gap tells us
The best purely deterministic pipeline (scattering + SVM) achieves 82% on CIFAR-10 versus >99% for learned models. This **20% gap** quantifies the information that geometric transforms cannot capture: non-geometric intra-class variabilities including object structure, context, clutter, and fine-grained part configurations. However, deterministic features dramatically outperform learned models in the low-data regime — Oyallon et al. (2017) showed scattering hybrids significantly outperform end-to-end learning with only 500 CIFAR-10 samples. The constellation system's role is precisely to bridge this gap: learned anchor points capture the non-geometric structure that deterministic transforms miss, while the geometric pipeline provides the stable, interpretable backbone.
Conclusion: an actionable hierarchy of primitives
Three findings emerge from this survey that reshape the design space for geometric encoders.
First, the scattering transform is not just the best deterministic feature extractor — it is a nearly complete first two stages of any pipeline, achieving 82% CIFAR-10 with zero learned parameters and providing Lipschitz-continuous, energy-preserving features that interface naturally with hyperspheres via L2 normalization. No other deterministic method comes close.
Second, the choice of hypersphere dimension matters profoundly. Dimensions 8 and 24 are special: E₈ and Leech lattice provide universally optimal anchor configurations with 240 and 196,560 maximally separated points respectively. Dimension 4 enables the Hopf fibration S³→S² for hierarchical encoding. The pipeline should target these dimensions for the constellation layer.
Third, information-theoretic geometry provides the missing theoretical framework. The Fisher-Rao metric's connection to the hypersphere via the square-root embedding (d_FR = 2·arccos(∫√(pq)dx)) explains why angular distances on spheres are natural for comparing distributions. Rate-distortion theory predicts that optimal compression under classification constraints naturally produces prototypization, specialization, and orthogonalization — exactly the behaviors a constellation system should exhibit. And the recent proof that contrastive learning naturally produces hyperspherical embeddings validates the entire paradigm from first principles.
The primitives in this manifest are not just a catalog — they form a composable algebra of geometric operations. The scattering transform decomposes signals into geometric invariants. JL and RFF compress while preserving structure. Stereographic projection, exp/log maps, and parallel transport bridge between flat and curved representations. Spherical harmonics and Hopf fibrations decompose sphere structure hierarchically. And vMF mixtures with optimally-placed anchors triangulate the result. Each operation is deterministic, well-understood, and has known bounds on what it preserves and what it discards. The encoder architect's task is to compose them into a pipeline where total information loss, measured by I(X;Z), is minimized subject to the dimensionality constraints of the target hypersphere.
Prompter: AbstractPhil Author: Claude Opus 4.6 - Research
This article is completely AI generated through Claude AI using Claude Opus 4.6 with research mode enabled. The information may be incorrect, invalid, or faulty in one way or another.
Claude utilized the extreme depth of our combined research to determine the best possible courses of direction for pure image encoding and utility of structural behavior.