How the frequencies of byte values decay from most-common to least-common.
Counts the frequency of each of the 256 possible byte values, sorts them from most to least common, and fits the Zipf-Mandelbrot law: does frequency drop off as a power of rank? Natural language follows Zipf's law closely (the 2nd most common word appears half as often as the 1st). Random data has flat frequency — no decay. This geometry characterizes the "vocabulary" structure of the byte stream.
The Zipf exponent: how steeply does frequency decay with rank? Alpha = 0 means flat (all bytes equally common). Alpha = 1 is Zipf's law (natural language). Poker hands (3.87) and Collatz gap lengths (3.62) score highest — extremely steep decay, a few values dominate completely. Sensor event streams (3.04) are similarly top-heavy. Collatz parity scores 0.0 (two values only, not enough for a power-law fit).
How well does the Zipf-Mandelbrot model actually fit? De Bruijn (1.0) scores perfect: its uniform distribution is a trivial special case (alpha = 0, perfect fit). Divisor count (0.99) and logistic near-full chaos (0.99) also fit well. Collatz parity scores 0.0 (too few unique values). A high alpha with low r_squared means the signal is concentrated but not in a power-law way — useful for distinguishing genuine Zipf behavior from arbitrary concentration.
The Mandelbrot offset parameter: how much do the low ranks deviate from pure Zipf? Large q means the most common values are less dominant than Zipf would predict — the top of the frequency curve is flattened. Solar wind IMF, Solar wind speed, and Sunspot all score 10.0 (maximum q — their distributions have a plateau at the top before the power-law tail kicks in). Logistic chaos and constants score 0.0.
Income-inequality measure applied to byte frequencies. 0.0 means perfect equality (all bytes equally common). 1.0 means maximal inequality (one byte gets all the count). Rainfall (0.97), Forest fire (0.95), and Neural net pruned (0.94) are the most unequal — a handful of values dominate. Constants score 0.0 (only one value — no inequality when there is only one entity).
Fraction of distinct byte values that appear exactly once (hapax legomena). Rainfall (0.31), Accel sit (0.30), and EEG tumor (0.26) score highest — many byte values appear only once, indicating a sparse tail. Logistic chaos, Henon map, and Tent map score 0.0 (chaotic maps visit enough values often enough that none are unique). High hapax ratio signals have "rare words" — a linguistic fingerprint of sparse, heavy-tailed data.
| Source | Domain | Value |
|---|---|---|
| Logistic r=3.2 (Period-2) | chaos | 4.0000 |
| Constant 0xFF | noise | 4.0000 |
| Logistic r=3.83 (Period-3 Window) | chaos | 4.0000 |
| ··· | ||
| Wichmann-Hill | binary | 0.0104 |
| XorShift32 | binary | 0.0105 |
| White Noise | noise | 0.0105 |
| Source | Domain | Value |
|---|---|---|
| Nikkei Returns | financial | 1.1226 |
| NASDAQ Returns | financial | 1.1039 |
| NYSE Returns | financial | 1.0961 |
| ··· | ||
| Constant 0xFF | noise | 0.0000 |
| Logistic r=3.2 (Period-2) | chaos | 0.0000 |
| Logistic r=3.83 (Period-3 Window) | chaos | 0.0000 |
| Source | Domain | Value |
|---|---|---|
| Rainfall (ORD Hourly) | climate | 0.9666 |
| Forest Fire | exotic | 0.9519 |
| Neural Net (Pruned 90%) | binary | 0.9376 |
| ··· | ||
| Gray Code Counter | exotic | 0.0000 |
| Logistic r=3.5 (Period-4) | chaos | 0.0000 |
| De Bruijn Sequence | number_theory | 0.0000 |
| Source | Domain | Value |
|---|---|---|
| Rainfall (ORD Hourly) | climate | 0.3112 |
| Accel Sit | motion | 0.2975 |
| EEG Tumor | medical | 0.2612 |
| ··· | ||
| Logistic Chaos | chaos | 0.0000 |
| Henon Map | chaos | 0.0000 |
| Tent Map | chaos | 0.0000 |
| Source | Domain | Value |
|---|---|---|
| Regime Switching | noise | 10.0000 |
| Accel Walk | motion | 10.0000 |
| Accel Jog | motion | 10.0000 |
| ··· | ||
| Devil's Staircase | exotic | 0.0000 |
| Forest Fire | exotic | 0.0000 |
| Beta Noise | noise | 0.0000 |
| Source | Domain | Value |
|---|---|---|
| Poker Hands | exotic | 3.8635 |
| DNA Thermus | bio | 3.7681 |
| Collatz Gap Lengths | number_theory | 3.5887 |
| ··· | ||
| Gray Code Counter | exotic | 0.0000 |
| De Bruijn Sequence | number_theory | 0.0000 |
| Logistic r=3.74 (Period-5 Window) | chaos | 0.0001 |
| Source | Domain | Value |
|---|---|---|
| Gray Code Counter | exotic | 1.0000 |
| De Bruijn Sequence | number_theory | 1.0000 |
| Divisor Count | number_theory | 0.9896 |
| ··· | ||
| Wigner Semicircle | quantum | 0.3129 |
| Clipped Sine | waveform | 0.3524 |
| Weierstrass | exotic | 0.4399 |
Zipf-Mandelbrot (8-bit) is the framework's vocabulary profiler at single-byte resolution. The combination of alpha (decay steepness), r_squared (fit quality), and gini (concentration) gives a three-dimensional characterization of the frequency curve that entropy alone collapses to a single number. In the atlas, rainfall and forest fire cluster together on the high-gini, high-alpha, high-hapax corner — both are "natural language-like" in having a few dominant values and a long sparse tail. PRNGs and De Bruijn occupy the opposite corner: flat frequencies, low gini, zero hapax.