Base Conversion Strategy

The base conversion strategy is a mathematical approach for extracting unbiased random bits from sequences of numbers in custom ranges. This is essential when testing random number generators that don't produce standard 0-based ranges.

The Problem

Consider a random number generator that produces numbers from 1 to 100. If we simply convert each number to binary using fixed-width encoding (e.g., 8 bits), we introduce systematic bias:

Example: Number 1 in 8-bit encoding Binary: 00000001
Result: 7 leading zeros, only 1 trailing one

These leading zeros don't represent true randomness—they're artifacts of the encoding. Statistical tests would incorrectly detect patterns in these structural zeros.

The Solution: Base Conversion

Base conversion treats your entire sequence as a single large number represented in base-N, where N is the size of your range. This extracts only the true entropy without encoding artifacts.

How It Works

For a sequence of numbers [n₁, n₂, n₃, ...] in range [min, max]:

  1. Normalize: Convert each number to 0-based: normalized = number - min
  2. Calculate range size: range_size = max - min + 1
  3. Combine into large integer: Treat the sequence as a base-range_size number
  4. Convert to binary: Extract the binary representation of this large integer
  5. Adjust length: Ensure output matches expected entropy
Expected bits = ⌈count × log₂(range_size)⌉

Concrete Example

Input: [2, 5, 8, 3] with range 2-8 (7 possible values)

Step 1 - Normalize: [0, 3, 6, 1]

Step 2 - Convert to base-7 large number:
value = 0×7³ + 3×7² + 6×7¹ + 1×7⁰ = 0 + 147 + 42 + 1 = 190

Step 3 - Convert to binary:
190₁₀ = 10111110₂

Step 4 - Calculate expected bits:
Expected = ⌈4 × log₂(7)⌉ = ⌈4 × 2.807⌉ = 12 bits

Step 5 - Pad to 12 bits:
Final output: 000010111110 (12 bits)

Key Characteristics

Property Description
Deterministic Same input sequence always produces same bit output
Consistent Length All sequences of same length and range produce same number of bits
Entropy Preserving Extracts exactly log₂(range_size) bits per number
Unbiased No leading zeros or encoding artifacts
⚠️ Important: All sequences in the same range produce the same bit length, whether they contain all minimum values, all maximum values, or mixed values. This consistency is crucial for unbiased statistical testing.

Entropy Calculation Examples

Range Possible Values Bits per Number Example (4 numbers)
0-1 2 1.00 4 bits
0-3 4 2.00 8 bits
1-6 (dice) 6 2.58 11 bits
2-8 7 2.81 12 bits
1-100 100 6.64 27 bits

When to Use Base Conversion

Use base conversion when:

Use fixed bit-width when:

Technical Implementation

The implementation uses arbitrary-precision arithmetic (BigUint) to handle large sequences without overflow. The conversion is mathematically equivalent to treating your sequence as a single number written in a custom base, then converting that to binary.

The algorithm ensures consistent output length by padding with leading zeros if needed, or trimming excess leading zeros if the binary representation is too long (which only occurs when the output is naturally aligned to byte boundaries).

← Back to Validator