Zipf–Mandelbrot (16-bit)

Zipf exponent, vocabulary richness, frequency decay
distributionaldim Linguistic5 metrics

What It Measures

The same vocabulary structure as the 8-bit version, but at the resolution of byte pairs.

Packs each pair of consecutive bytes into a 16-bit symbol (65,536 possible values) using a sliding window, then performs the same Zipf-Mandelbrot analysis. The larger alphabet dramatically increases sensitivity to sequential structure: two signals with identical 8-bit histograms can have completely different 16-bit Zipf profiles if their byte-to-byte transitions differ.

Metrics

zipf_alpha

Zipf exponent at 16-bit resolution. Poker hands (2.74) and Sensor event streams (2.56) still top the ranking, but with lower exponents than the 8-bit version — the larger vocabulary dilutes the concentration. Collatz gap lengths drops from 3.62 (8-bit) to 2.39 (16-bit), indicating its dominance is mainly a single-byte phenomenon. Signals where alpha is similar at both resolutions have concentration that extends to pair structure.

zipf_r_squared

Fit quality at 16-bit resolution. Collatz gap lengths (0.99), Poker hands (0.98), and Continued fractions (0.98) score highest — their frequency decay is genuinely power-law even at pair resolution. This is more discriminating than the 8-bit version: many signals that fit Zipf at byte level fall apart at the pair level because their byte transitions are not power-law distributed.

mandelbrot_q

The same plateau parameter. Solar wind IMF and Solar wind speed still score 10.0 — their distributional plateau persists at pair resolution. Devil's staircase drops from nonzero (8-bit) to 0.0 (16-bit): its plateau structure is a single-byte phenomenon that doesn't extend to pairs. Tidal gauge (10.0) joins the top at 16-bit — its slow tidal oscillations create repeated byte pairs that flatten the top of the frequency curve.

gini_coefficient

Frequency inequality at pair resolution. Rainfall (0.97), Forest fire (0.95), and Neural net pruned (0.91) remain the most unequal. The pruned network drops from 0.94 (8-bit) to 0.91 (16-bit) — its zero-dominated weight distribution is slightly less extreme when viewed as pairs. Logistic period-3 scores 0.0 (its 3 repeated values produce 3 repeated pairs, all equally common).

hapax_ratio

Fraction of 16-bit symbols appearing exactly once. Champernowne (0.95) tops the ranking — its digit-concatenation construction creates an enormous number of unique byte pairs. Middle-Square (0.93) and Intermittent silence (0.90) follow. The hapax explosion at 16-bit resolution is expected: with 65,536 possible symbols and finite data, many pairs will be seen only once. The signals at the bottom (Sawtooth wave at 0.0) are those whose pair vocabulary is so constrained that every pair repeats.

bigram_predictability

Identical to the 8-bit variant: it uses the same 16-symbol coarse-graining of the byte stream rather than the native 16-bit alphabet, so Zipf-Mandelbrot (8-bit) and (16-bit) return the same number (r=+1.000 across sources). Kept in both geometries so the temporal-vocabulary block is self-contained; the 8-bit entry above carries the detailed source citations.

entropy_nonstationarity

Identical to the 8-bit variant (same 16-symbol quantization, same window structure). See the 8-bit entry for source citations. The metric's presence in both 8-bit and 16-bit Zipf-Mandelbrot sections reflects their shared _temporal_vocabulary helper, not a genuine independent 16-bit measurement.

Atlas Rankings

gini_coefficient
SourceDomainValue
Devil's Staircaseexotic0.9780
Aubry-André Criticalquantum0.9744
Rainfall (ORD Hourly)climate0.9723
···
Logistic r=3.83 (Period-3 Window)chaos0.0000
Logistic r=3.2 (Period-2)chaos0.0000
Logistic r=3.5 (Period-4)chaos0.0000
hapax_ratio
SourceDomainValue
Champernownenumber_theory0.9461
Copeland-Erdősnumber_theory0.9158
Intermittent Silenceexotic0.9038
···
DNA Centromerebio0.0000
Thue-Morseexotic0.0000
DNA Plasmodiumbio0.0000
mandelbrot_q
SourceDomainValue
ETH/BTC Ratiofinancial100.0000
Pink Noisenoise100.0000
Brownian Walknoise100.0000
···
Solar Flares Daily Peakastro0.0000
Gaussian Collatz Orbitnumber_theory0.0000
Sandpileexotic0.0000
zipf_alpha
SourceDomainValue
Poker Handsexotic2.7422
von Mangoldt Functionnumber_theory2.5923
Sensor Event Streamexotic2.5555
···
Gray Code Counterexotic0.0001
De Bruijn Sequencenumber_theory0.0001
Dice Rollsexotic0.0503
zipf_r_squared
SourceDomainValue
Collatz Gap Lengthsnumber_theory0.9929
Surface Wind (ORD 5-min)climate0.9909
Prime Gapsnumber_theory0.9907
···
Gray Code Counterexotic0.0068
De Bruijn Sequencenumber_theory0.0080
Phyllotaxisbio0.4408

When It Lights Up

The 16-bit version's primary value over the 8-bit is detecting transition structure. A signal with a flat byte histogram (high 8-bit entropy, low 8-bit alpha) can still have extreme 16-bit concentration if certain byte transitions dominate. Comparing alpha and hapax between the two resolutions reveals how much of the signal's vocabulary structure is local (single-byte) versus sequential (pair). In the atlas, Champernowne's jump from low hapax at 8-bit (it uses all digits) to 0.95 hapax at 16-bit (its digit pairs are almost all unique) is the clearest example: its structure is entirely in the sequence of bytes, not in their individual distribution.

Open in Atlas
← Zipf–Mandelbrot (8-bit)Cantor Set →