Cantor Set

Gaps, dust fraction, ternary self-similarity
distributionaldim fractal (~0.631)5 metrics

What It Measures

How gappy the signal's ternary address space is.

Each byte is re-interpreted as a base-3 address into the unit interval, the way the classical Cantor set construction works: each bit selects "left third" or "right third," skipping the middle. The resulting coordinates cluster around Cantor-set-like positions if the data has ternary self-similarity, or spread uniformly if it doesn't. The geometry then measures the gap structure of these coordinates.

Metrics

coverage

Fraction of distinct embedded coordinates relative to total data length. Sprott-B, Projectile, and Damped Pendulum score 0.016 (many repeated coordinates — smooth dynamics map many different byte values to the same Cantor address). Fibonacci word scores 0.0001 (its binary structure creates extreme degeneracy in ternary representation). High coverage means the data explores many distinct positions in the Cantor embedding; low coverage means it collapses to a dust-like subset.

max_gap

The largest gap between adjacent sorted coordinates. L-System Dragon, Morse code, and Rule 110 all hit 1.0 (a gap spanning the full interval — the data avoids an entire region of the ternary address space). Collatz gap lengths scores 0.005 (tiny gaps, nearly uniform coverage). A large max_gap means the data has a forbidden zone in its ternary structure, like the middle third removed in the classical Cantor construction.

mean_gap

Average spacing between consecutive sorted coordinates. Accel walk, Kepler exoplanet, and Zipf distribution cluster at 6.1e-5 (tightly packed — many distinct coordinates with small gaps). Collatz gap lengths scores 7.8e-7 (extremely dense). Mean gap complements max_gap: a signal can have large max_gap but small mean_gap if it has one big hole and is densely packed everywhere else.

bit_plane_autocorrelation

Average lag-1 autocorrelation across the 8 bit planes of the byte stream. Logistic period-2 and constants score 1.0 (each bit plane is perfectly correlated). L-System Dragon and De Bruijn score 0.0001 (bit planes are uncorrelated — the low-order bits change unpredictably). This detects temporal structure in the binary representation that the ternary Cantor embedding misses. Evolved via ShinkaEvolve.

jump_entropy

Shannon entropy of the gap sizes between consecutive Cantor coordinates. DNA Human scores 1.0 (maximally diverse jump sizes). Gzip (0.053) and Pi Digits (0.055) have the lowest entropy — their gap sizes are nearly uniform, meaning the Cantor coordinates change by approximately the same amount at each step. High jump entropy means the signal's ternary representation has diverse inter-step structure. Evolved via ShinkaEvolve.

Atlas Rankings

bit_plane_autocorrelation
SourceDomainValue
Constant 0x00noise1.0000
Logistic r=3.2 (Period-2)chaos1.0000
PID Controllerexotic0.9945
···
L-System (Dragon Curve)exotic0.0001
De Bruijn Sequencenumber_theory0.0001
Categorical Sensorexotic0.0053
coverage
SourceDomainValue
Projectile with Dragmotion0.0156
Shuffled Blocksexotic0.0156
Kicked Rotorquantum0.0156
···
Constant 0xFFnoise0.0001
Fibonacci Wordexotic0.0001
Square Wavewaveform0.0001
jump_entropy
SourceDomainValue
DNA Chimpbio1.0000
DNA Humanbio1.0000
DNA Phage Lambdabio1.0000
···
Gzip (level 9)binary0.0538
Pi Digitsnumber_theory0.0553
XorShift32binary0.0556
max_gap
SourceDomainValue
Fibonacci Wordexotic0.9998
Symbolic Lorenzexotic0.9998
Pulse-Width Modulationwaveform0.9998
···
Constant 0xFFnoise0.0000
Collatz Gap Lengthsnumber_theory0.0053
Poisson Countsexotic0.0095
mean_gap
SourceDomainValue
Kepler Exoplanetastro0.0001
Sprott-Bchaos0.0001
Tohoku Aftershock Intervalsgeophysics0.0001
···
Constant 0xFFnoise0.0000
Collatz Gap Lengthsnumber_theory0.0000
Poisson Countsexotic0.0000

When It Lights Up

Cantor Set geometry detects structure in the low bits of byte values — the ternary address depends on the full bit pattern, not just the magnitude. Signals with repetitive low-bit patterns (periodic orbits, symbolic dynamics) produce degenerate Cantor embeddings with extreme gaps. In the atlas, the combination of low coverage and high max_gap identifies signals whose byte values avoid specific ternary regions, which is a different kind of regularity than the distributional uniformity measured by Torus or Wasserstein.

Open in Atlas
← Zipf–Mandelbrot (16-bit)2-adic →