Base Conversion Strategy

The base conversion strategy is a mathematical approach for extracting unbiased random bits from sequences of numbers in custom ranges. This is essential when testing random number generators that don't produce standard 0-based ranges.

The Problem

Consider a random number generator that produces numbers from 1 to 100. If we simply convert each number to binary using fixed-width encoding (e.g., 8 bits), we introduce systematic bias:

Example: Number 1 in 8-bit encoding Binary: 00000001
Result: 7 leading zeros, only 1 trailing one

These leading zeros don't represent true randomness—they're artifacts of the encoding. Statistical tests would incorrectly detect patterns in these structural zeros.

The Solution: Base Conversion

Base conversion treats your entire sequence as a single large number represented in base-N, where N is the size of your range. This extracts only the true entropy without encoding artifacts.

How It Works

For a sequence of numbers [n₁, n₂, n₃, ...] in range [min, max]:

Normalize: Convert each number to 0-based: normalized = number - min
Calculate range size: range_size = max - min + 1
Combine into large integer: Treat the sequence as a base-range_size number
Convert to binary: Extract the binary representation of this large integer
Adjust length: Ensure output matches expected entropy

Expected bits = ⌈count × log₂(range_size)⌉

Concrete Example

Input: [2, 5, 8, 3] with range 2-8 (7 possible values)

Step 1 - Normalize: [0, 3, 6, 1]

Step 2 - Convert to base-7 large number:
value = 0×7³ + 3×7² + 6×7¹ + 1×7⁰ = 0 + 147 + 42 + 1 = 190

Step 3 - Convert to binary:
190₁₀ = 10111110₂

Step 4 - Calculate expected bits:
Expected = ⌈4 × log₂(7)⌉ = ⌈4 × 2.807⌉ = 12 bits

Step 5 - Pad to 12 bits:
Final output: 000010111110 (12 bits)

Key Characteristics

Property	Description
Deterministic	Same input sequence always produces same bit output
Consistent Length	All sequences of same length and range produce same number of bits
Entropy Preserving	Extracts exactly log₂(range_size) bits per number
Unbiased	No leading zeros or encoding artifacts

⚠️ Important: All sequences in the same range produce the same bit length, whether they contain all minimum values, all maximum values, or mixed values. This consistency is crucial for unbiased statistical testing.

Entropy Calculation Examples

Range	Possible Values	Bits per Number	Example (4 numbers)
0-1	2	1.00	4 bits
0-3	4	2.00	8 bits
1-6 (dice)	6	2.58	11 bits
2-8	7	2.81	12 bits
1-100	100	6.64	27 bits

When to Use Base Conversion

Use base conversion when:

Your RNG produces numbers in a specific range (e.g., 1-100, dice rolls 1-6)
Your numbers don't start at 0
You want to test the true randomness without encoding bias

Use fixed bit-width when:

Your numbers start at 0 (0-255, 0-65535, etc.)
You're working with raw bytes or standard computer representations
You want simpler, faster processing for standard ranges

Technical Implementation

The implementation uses arbitrary-precision arithmetic (BigUint) to handle large sequences without overflow. The conversion is mathematically equivalent to treating your sequence as a single number written in a custom base, then converting that to binary.

The algorithm ensures consistent output length by padding with leading zeros if needed, or trimming excess leading zeros if the binary representation is too long (which only occurs when the output is naturally aligned to byte boundaries).

← Back to Validator