Wasserstein

Distribution shape, transport cost, self-similarity

distributionaldim distribution space6 metrics

What It Measures

How the distribution of values differs from uniform, and how stable that distribution is across the signal.

Bins the data into a 32-bin histogram and treats it as a probability distribution. Then asks three questions: how far is this distribution from uniform (optimal transport cost)? How concentrated is it (peak height)? And does the first half of the signal look like the second half (self-similarity)?

Metrics

concentration

The peak bin height times the number of bins. 1.0 means uniform (De Bruijn scores exactly 1.0 — its construction guarantees every byte pattern appears equally). Above 1.0 means the distribution has a spike. Collatz gap lengths (31.8), Rainfall (31.5), and Forest fire (29.4) are the most concentrated signals in the atlas — their heavy-tailed distributions pile most of their mass into the lowest bin.

dist_from_uniform

Earth mover's distance from the uniform distribution: the minimum amount of "dirt" you'd need to move to make the histogram flat. Collatz gap lengths (0.48) and Rainfall (0.48) are farthest from uniform. Neural net pruned weights (0.46) are close behind — pruning creates a spike at zero. De Bruijn scores near 0 (already uniform).

entropy

Shannon entropy of the 32-bin histogram. De Bruijn, circle map quasiperiodic, and phyllotaxis all score 5.0 (near the maximum of 5 bits — flat distribution). Collatz gap lengths scores 0.05 (almost all mass in one bin). This is the classical measure of distributional spread, here computed on the Wasserstein embedding.

self_similarity

One minus the earth mover's distance between the first-half and second-half histograms. 1.0 means the distribution is perfectly stable over time (logistic period-4, logistic period-2, De Bruijn). Hilbert walk scores 0.60 (its deterministic sweep creates different distributions in the first and second halves). This catches nonstationarity that entropy and concentration miss: a signal can have high entropy overall but low self_similarity if its distribution drifts.

transport_variability

Coefficient of variation of windowed earth mover's distances between consecutive segments. Exponential Chirp (0.21) scores highest — its frequency sweep creates rapidly changing local distributions. Sunspot (0.13) and Pulse-Width Mod (0.13) also score high. Constants and periodic orbits score 0.0 (identical windows). This measures how much the optimal transport cost fluctuates over time — a windowed nonstationarity detector complementing self_similarity's global split. Evolved via ShinkaEvolve.

recurrence_distance

Average earth mover's distance between non-adjacent windows that are within a recurrence threshold. PID Controller (0.11) and Exponential Chirp (0.10) score highest — their recurring distributional states differ in fine detail. Constants score 0.0. This measures how similar the signal's distribution is when it "returns" to a previously visited distributional state. Evolved via ShinkaEvolve.

Atlas Rankings

concentration
Source	Domain	Value
Constant 0x00	noise	32.0000
Collatz Gap Lengths	number_theory	31.8096
Rainfall (ORD Hourly)	climate	31.4505
···
Gray Code Counter	exotic	1.0000
De Bruijn Sequence	number_theory	1.0000
Phyllotaxis	bio	1.0070

dist_from_uniform
Source	Domain	Value
Constant 0xFF	noise	0.4844
Constant 0x00	noise	0.4844
Collatz Gap Lengths	number_theory	0.4842
···
De Bruijn Sequence	number_theory	0.0000
Gray Code Counter	exotic	0.0000
Circle Map Quasiperiodic	chaos	0.0019

entropy
Source	Domain	Value
Gray Code Counter	exotic	5.0000
De Bruijn Sequence	number_theory	5.0000
Circle Map Quasiperiodic	chaos	4.9996
···
Constant 0xFF	noise	-0.0000
Collatz Gap Lengths	number_theory	0.0526
Rainfall (ORD Hourly)	climate	0.1634

recurrence_distance
Source	Domain	Value
PID Controller	exotic	0.1074
Exponential Chirp	exotic	0.0997
Pulse-Width Modulation	waveform	0.0823
···
Constant 0xFF	noise	0.0000
Logistic r=3.5 (Period-4)	chaos	0.0000
Logistic r=3.2 (Period-2)	chaos	0.0000

self_similarity
Source	Domain	Value
Gray Code Counter	exotic	1.0000
De Bruijn Sequence	number_theory	1.0000
Constant 0x00	noise	1.0000
···
Hilbert Walk	exotic	0.5974
Levy Flight	exotic	0.6045
ETH/BTC Ratio	financial	0.6243

transport_variability
Source	Domain	Value
Exponential Chirp	exotic	0.2077
Pulse-Width Modulation	waveform	0.1334
Sunspot Number	astro	0.1252
···
Constant 0xFF	noise	0.0000
Logistic r=3.5 (Period-4)	chaos	0.0000
Logistic r=3.2 (Period-2)	chaos	0.0000

When It Lights Up

Wasserstein self_similarity is the distributional lens's nonstationarity detector. Signals that change character midstream — sensor drift, regime switches, concatenated recordings — score low on self_similarity while potentially scoring high on all other distributional metrics. In the atlas, Wasserstein's concentration axis separates the heavy-tailed cluster (Collatz, rainfall, forest fire) from the uniform-distribution cluster (PRNGs, De Bruijn), while self_similarity provides an orthogonal axis that catches temporal instability invisible to any single-histogram metric.

Open in Atlas

← Fisher InformationZipf–Mandelbrot (8-bit) →