Synthesizer: Data Structures
This document explains the core data structures used in the Tokamak Synthesizer for symbol processing and circuit generation.
On This Page
Overview
The Synthesizer uses symbol-based processing instead of value-based computation. This document explains the four core data structures that enable circuit generation:
┌─────────────────────────────────────────────────────────────────────┐
│ Synthesizer Data Structures: From Values to Circuits │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ [1] DataPt (Data Point) │
│ ┌────────────────────────────────────┐ │
│ │ Symbol = Value + Traceability │ │
│ │ { value: 10n, source: 0, wire: 0 }│ │
│ └────────────────────────────────────┘ │
│ ↓ ↓ │
│ │
│ [2] StackPt [3] MemoryPt │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ DataPt[] │ │ Map<time, DataPt>│ │
│ │ (Symbolic │ │ (2D: offset×time)│ │
│ │ Stack) │ │ (Aliasing track) │ │
│ └──────────────┘ └──────────────────┘ │
│ ↓ ↓ │
│ └───────────┬───────────┘ │
│ ↓ │
│ │
│ [4] Placement (Subcircuit Instance) │
│ ┌──────────────────────────────────────┐ │
│ │ name: 'ALU1' │ │
│ │ usage: 'ADD' │ │
│ │ inPts: [DataPt, DataPt] ──> Circuit │ │
│ │ outPts: [DataPt] ───────────────────>│ │
│ └──────────────────────────────────────┘ │
│ │ │
│ └──> Forms DAG (Circuit Graph) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Symbol-Based Processing
Each piece of data is represented as a DataPt (Data Point), which tracks:
- Value: The actual data (bigint)
- Source: Where the data came from (placement ID or external source)
- Wire Index: Which output wire from the source subcircuit
- Metadata: Type, size, and other context information
// EVM: Value-based processing
const result = a + b; // Just the final value
// Synthesizer: Symbol-based processing
const aPt = { source: 0, wireIndex: 0, value: 10n, ... };
const bPt = { source: 0, wireIndex: 1, value: 20n, ... };
const resultPt = synthesizer.placeArith('ADD', [aPt, bPt]); // Tracks entire data flow
DataPt (Data Point)
Concept
DataPt is a symbol that represents data flowing through the circuit. Think of it as a "tracking tag" attached to each piece of data:
┌─────────────────────────────────────────────────────────────┐
│ DataPt: A Symbol Tracking Data Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ source=4 ┌────────────┐ │
│ │ Value │ wireIndex=0 │ Traceability│ │
│ │ 15n │ ─────────────>│ Information │ │
│ └──────────┘ └────────────┘ │
│ ↓ │ │
│ What data? Where from? │
│ (actual value) (placement + wire) │
│ │
└─────────────────────────────────────────────────────────────┘
Key Roles:
- Traceability: Track where data originates (source placement, wire index)
- Circuit Generation: Connect subcircuit outputs to inputs via wire indices
- Value Tracking: Maintain actual values for consistency checks
- External Interface: Bridge between Ethereum state and circuit symbols
Visual: DataPt Journey
External Value DataPt Creation Circuit Usage
────────────── ─────────────── ─────────────
calldata[0] = 5 ─┐
├──> DataPt { ┌──> Used as input
│ value: 5n, │ to ALU1
│ source: 0, ─────┤
│ wireIndex: 0, │
Block info ─┤ type: 'Calldata' └──> Tracked in
│ } circuit graph
Storage value ─┘
Definition
Source: types/synthesizer.ts:48-69
export interface CreateDataPointParams {
// Placement index at which the dataPt comes from
source: number;
// Wire index at which the dataPt comes from (within the source placement)
wireIndex?: number;
// Actual value of the data
value: bigint;
// Size of the data source (in bytes)
sourceSize: number;
// === External Source Information (if applicable) ===
// Address/identifier if data comes from external source
extSource?: string;
// Address/identifier if data goes to external destination
extDest?: string;
// Type of external data (e.g., 'CALLDATA', 'BLOCKHASH', 'Storage')
type?: string;
// Key if the external data comes from or goes to a DB (e.g., storage key)
key?: string;
// Offset if the external data comes from memory
offset?: number;
}
export type DataPt = CreateDataPointParams & { valueHex: string };
Creation
Source: pointers/dataPointFactory.ts:6-16
export class DataPointFactory {
public static create(params: CreateDataPointParams): DataPt {
SynthesizerValidator.validateValue(params.value);
const hex = params.value.toString(16);
const paddedHex = hex.length % 2 === 1 ? '0' + hex : hex;
const valueHex = '0x' + paddedHex;
return {
...params,
valueHex,
};
}
}
Example: DataPt Lifecycle
// Step 1: Load external data as DataPt
const calldataPt = synthesizer.loadEnvInf(
address,
'Calldata',
0x05n, // value
0, // offset
1 // size
);
// Result: { source: 0, wireIndex: 0, value: 5n, type: 'Calldata', ... }
// Step 2: Use DataPt in arithmetic operation
const constantPt = synthesizer.loadAuxin(10n);
// Result: { source: 0, wireIndex: 1, value: 10n, ... }
// Step 3: Generate circuit placement
const resultPt = synthesizer.placeArith('ADD', [calldataPt, constantPt]);
// Result: { source: 4, wireIndex: 0, value: 15n, ... }
// source=4 means this comes from placement #4 (the ADD subcircuit)
// Step 4: Store to external state
synthesizer.storePrvOut(address, 'Storage', resultPt, storageKey);
Key Fields Explained
| Field | Type | Purpose | Example |
|---|---|---|---|
source |
number |
Placement ID where data originates | 4 (from placement #4) |
wireIndex |
number? |
Output wire index from source | 0 (first output) |
value |
bigint |
Actual data value | 15n |
valueHex |
string |
Hex representation | "0x0f" |
sourceSize |
number |
Data size in bytes | 32 |
extSource |
string? |
External source address | "0x123...abc" |
type |
string? |
External data type | "Calldata", "Storage" |
key |
string? |
Storage key (if applicable) | "0x00...01" |
offset |
number? |
Memory offset (if applicable) | 64 |
StackPt (Symbol Stack)
Concept
StackPt is the symbolic equivalent of the EVM stack. While the EVM stack holds actual values for computation, StackPt holds DataPt symbols for tracking data flow:
EVM Stack StackPt (Symbol Stack)
───────────────── ──────────────────────────────
┌──────────────┐ ┌───────────────────────────┐
│ 30 │ ← value │ DataPt { value: 30n, │
├──────────────┤ │ source: 4, │
│ 20 │ │ wireIndex: 0 } │
├──────────────┤ ├───────────────────────────┤
│ 10 │ │ DataPt { value: 20n, │
└──────────────┘ │ source: 0, │
│ wireIndex: 1 } │
Stores values ├───────────────────────────┤
for execution │ DataPt { value: 10n, │
│ source: 0, │
│ wireIndex: 0 } │
└───────────────────────────┘
Stores symbols
for circuit generation
Visual: Parallel Processing
Opcode: 0x01 ADD
┌─────────────────────────────┐ ┌──────────────────────────────────┐
│ EVM Stack (functions.ts) │ │ StackPt (handlers.ts) │
├─────────────────────────────┤ ├──────────────────────────────────┤
│ │ │ │
│ pop: [10, 20] │ │ pop: [DataPt{10}, DataPt{20}] │
│ ↓ │ │ ↓ │
│ compute: 10 + 20 = 30 │ │ place: ALU1(selector, 10, 20) │
│ ↓ │ │ ↓ │
│ push: 30 │ │ push: DataPt{30, source:4} │
│ │ │ │
│ Result: Value only │ │ Result: Value + Traceability │
└─────────────────────────────┘ └──────────────────────────────────┘
Comparison
| Aspect | EVM Stack | StackPt |
|---|---|---|
| Data Type | bigint[] (values) |
DataPt[] (symbols) |
| Purpose | Execute operations | Track data flow |
| Push | Pushes actual value | Pushes symbol reference |
| Pop | Returns value for computation | Returns symbol for circuit generation |
| Operation | Performs arithmetic | Records circuit connections |
Definition
Source: pointers/stackPt.ts:1-56
export type TStackPt = DataPt[];
export class StackPt {
private _stack: TStackPt;
private _len: number;
private _maxHeight: number;
constructor(maxHeight = 1024) {
this._stack = [];
this._len = 0;
this._maxHeight = maxHeight;
}
public push(dataPt: DataPt): void;
public pop(): DataPt;
public popN(num: number): DataPt[];
public swap(position: number): void;
public dup(position: number): void;
// ... more methods
}
Key Differences from EVM Stack
Source Comment: pointers/stackPt.ts:6-26
/**
* Key differences between Stack and StackPt classes
*
* 1. Data Type
* - Stack: bigint[] (stores actual values)
* - StackPt: DataPt[] (stores data pointers)
*
* 2. Purpose
* - Stack: Used in actual EVM execution
* - StackPt: Used for symbolic execution
*
* 3. Operation Handling
* - Stack: Performs operations on actual values (e.g., actual addition)
* - StackPt: Manages pointers for data flow tracking
*
* 4. Usage
* - Stack: Actual transaction processing, contract execution
* - StackPt: Program analysis, optimization, circuit generation
*/
Example: Parallel Stack Processing
// EVM Stack (functions.ts) - Opcode 0x01: ADD
const [a, b] = runState.stack.popN(2);
const result = (a + b) % TWO_POW256;
runState.stack.push(result);
// Stack: [10, 20] → [30]
// StackPt (handlers.ts) - Opcode 0x01: ADD
const [aPt, bPt] = runState.stackPt.popN(2);
const resultPt = synthesizer.placeArith('ADD', [aPt, bPt]);
runState.stackPt.push(resultPt);
// StackPt: [DataPt{10}, DataPt{20}] → [DataPt{30, source:4}]
Memory Management Optimization
/**
* 2. Memory Management
* - Once allocated, array size never decreases
* - During pop operations, _len is decreased instead of actually deleting items
* - This is an optimization strategy to reduce memory reallocation costs
*/
// Implementation
public pop(): DataPt {
if (this._len < 1) {
throw new EvmError(ERROR.STACK_UNDERFLOW);
}
const dataPt = this._stack[this._len - 1];
this._len--; // Decrease length, don't delete item
return dataPt;
}
MemoryPt (Symbol Memory)
Concept
MemoryPt is a 2D data structure (offset × time) that solves the data aliasing problem in memory. Unlike EVM memory which only keeps the latest value, MemoryPt tracks all overlapping writes:
EVM Memory (1D) MemoryPt (2D: offset × time)
─────────────── ────────────────────────────
Memory Array: Timestamp Map:
┌────┬────┬────┬────┐
│ 00 │ 10 │ 20 │ 30 │ ← offset timestamp=0: { offset: 0x00, size: 32, dataPt1 }
└────┴────┴────┴────┘ timestamp=1: { offset: 0x10, size: 32, dataPt2 }
│ timestamp=2: { offset: 0x08, size: 16, dataPt3 }
└─> Only latest value
(old values lost) ↑
All writes preserved with timestamps!
Visual: The Data Aliasing Problem
Time ─────────────────────────────────────────>
t=0: MSTORE(0x00, dataPt1) Write 32 bytes at 0x00-0x20
┌────────────────────────────────┐
│ dataPt1 (value=100) │
└────────────────────────────────┘
0x00 0x20
t=1: MSTORE(0x10, dataPt2) Write 32 bytes at 0x10-0x30 (overlaps!)
┌────────────────────────────────┐
│ dataPt2 (value=200) │
└────────────────────────────────┘
0x10 0x30
t=2: MLOAD(0x00, 32) Load 32 bytes at 0x00-0x20
┌──────────────┬──────────────┐
│ dataPt1 │ dataPt2 │ ← Need BOTH!
│ (bytes 0-15)│ (bytes 0-15)│
└──────────────┴──────────────┘
0x00 0x10 0x20
EVM Result: Only sees final state (loses history)
MemoryPt: Returns DataAliasInfos for circuit generation
How MemoryPt Solves This
| Aspect | EVM Memory | MemoryPt |
|---|---|---|
| Data Structure | Uint8Array (continuous bytes) |
Map<timestamp, MemoryPtEntry> |
| Write | Overwrites bytes directly | Records timestamped data points |
| Read | Returns current byte values | Returns data alias information |
| Aliasing | Lost (only latest value) | Tracked (all overlapping writes) |
Definition
Source: pointers/memoryPt.ts:163-436
export type TMemoryPt = Map<number, MemoryPtEntry>;
export interface MemoryPtEntry {
memOffset: number; // Memory start offset
containerSize: number; // Size of data in bytes
dataPt: DataPt; // Data point reference
}
export class MemoryPt {
_storePt: TMemoryPt;
private _timeStamp: number;
constructor() {
this._storePt = new Map();
this._timeStamp = 0;
}
public write(offset: number, size: number, dataPt: DataPt): void;
public getDataAlias(offset: number, size: number): DataAliasInfos;
// ... more methods
}
Implementation Details
Source Comment: pointers/memoryPt.ts:135-161
/**
* 3. Read/Write Operations
* - Memory: Direct read/write to actual memory
* - MemoryPt:
* - Write: Creates new data pointers and manages overlapping regions
* - Read: Returns data alias information through getDataAlias
*
* 5. Characteristics
* - MemoryPt:
* - Timestamp-based data management
* - Memory region conflict detection
* - Data alias information generation
*/
Example: Overlapping Memory Writes
// Scenario: Overlapping MSTORE operations
1. MSTORE(0x00, dataPt1) // Write 32 bytes at 0x00-0x20, timestamp=0
2. MSTORE(0x10, dataPt2) // Write 32 bytes at 0x10-0x30, timestamp=1
3. MLOAD(0x00) // Load 32 bytes at 0x00-0x20
// MemoryPt tracks both writes:
_storePt = {
0: { memOffset: 0x00, containerSize: 32, dataPt: dataPt1 },
1: { memOffset: 0x10, containerSize: 32, dataPt: dataPt2 }
}
// MLOAD(0x00) generates DataAliasInfos:
[
{
dataPt: dataPt1,
shift: 0, // Use bytes 0-15 from dataPt1
masker: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00000000000000000000000000000000
},
{
dataPt: dataPt2,
shift: 128, // Use bytes 0-15 from dataPt2 (shifted left by 128 bits)
masker: 0x00000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
}
]
// Circuit generation:
// result = (dataPt1 & mask1) | ((dataPt2 >> 128) & mask2)
Data Alias Information
export interface DataAliasInfo {
dataPt: DataPt; // Source data point
shift: number; // Bit shift for alignment
masker: bigint; // Mask for extracting relevant bits
}
export type DataAliasInfos = DataAliasInfo[];
The getDataAlias() method returns information needed to reconstruct memory data from overlapping writes, which is then used to generate circuits (using DecToBit, Accumulator, and bitwise operations).
Placement
Concept
A Placement is an instance of a subcircuit with specific input/output data. Think of it like an object created from a class:
Subcircuit (Template/Class) Placement (Instance/Object)
─────────────────────────── ──────────────────────────
┌─────────────────────┐ ┌────────────────────────┐
│ ALU1.circom │ ─────> │ Placement ID: 4 │
│ │ │ name: 'ALU1' │
│ - Defines circuit │ │ usage: 'ADD' │
│ - Has 803 │ │ inPts: [sel, a, b] │
│ constraints │ │ outPts: [result] │
│ - Can do ADD, MUL, │ └────────────────────────┘
│ SUB, etc. │
└─────────────────────┘ ┌────────────────────────┐
─────> │ Placement ID: 5 │
│ name: 'ALU1' │
│ usage: 'MUL' │
│ inPts: [sel, x, y] │
│ outPts: [product] │
└────────────────────────┘
One template, multiple instances with different data!
Visual: Placement Creation Flow
Opcode Execution Placement Creation Circuit Graph
──────────────── ────────────────── ─────────────
ADD instruction ─> 1. Select subcircuit ─> Placement #4
(ALU1, selector=2) ┌──────────┐
Pop stack: │ ALU1 │
a = DataPt{10} ─> 2. Gather inputs ─> │ ADD │
b = DataPt{20} [sel, aPt, bPt] │ │
│ in: [2, │
Compute: ─> 3. Create output ─> │ 10, 20]│
result = 30 DataPt{30, source:4} │ out: [30]│
└──────────┘
Push result ─> 4. Store placement │
placements.set(4, ...) └─> Used by
next op
Purpose
- Template vs Instance: Subcircuit is a template (e.g., ALU1.circom), Placement is an instance
- ID Assignment: Each placement gets a unique sequential ID (starting from 4)
- Wire Connections:
outPtsfrom one placement becomeinPtsto another - Circuit Building: Placements form a Directed Acyclic Graph (DAG) representing the entire computation
Definition
Source: types/synthesizer.ts:71-76
export type PlacementEntry = {
name: SubcircuitNames; // Type of subcircuit (e.g., 'ALU1', 'AND')
usage: ArithmeticOperator; // Specific operation (e.g., 'ADD', 'MUL')
subcircuitId: SubcircuitId; // Numeric ID from QAP Compiler
inPts: DataPt[]; // Input data points
outPts: DataPt[]; // Output data points
};
export type Placements = Map<number, PlacementEntry>;
Example: Placement Creation
// When synthesizerArith('ADD', [aPt, bPt], resultValue, runState) is called:
// 1. Select subcircuit and create selector
const [subcircuitName, selectorValue] = SUBCIRCUIT_MAPPING['ADD']; // ['ALU1', 2n]
const selectorPt = synthesizer.loadAuxin(selectorValue);
// 2. Create output DataPt
const resultPt = DataPointFactory.create({
source: placementId, // ID of this placement (e.g., 4)
wireIndex: 0, // First output wire
value: resultValue, // Actual result (15n)
sourceSize: 32
});
// 3. Create and store placement
const placement: PlacementEntry = {
name: 'ALU1',
usage: 'ADD',
subcircuitId: 4,
inPts: [selectorPt, aPt, bPt],
outPts: [resultPt]
};
synthesizer.state.placements.set(placementId, placement);
Placement IDs 0-3: Buffer Placements
Special placements act as interfaces between the external world and the circuit:
Buffer Placements: The Bridge Between Worlds
────────────────────────────────────────────
External World Buffer Layer Circuit World
────────────── ──────────── ─────────────
Calldata ┐
Block info ├──> PUB_IN (ID=0) ──> Public symbols ──┐
msg.sender ┘ │
├──> Circuit
Storage ┐ │ operations
Account state ├──> PRV_IN (ID=2) ──> Private symbols ──┘
Private data ┘
┌──> Return data
│ Logs
Circuit result ──> PUB_OUT (ID=1) ──> Public output ────┘
Circuit result ──> PRV_OUT (ID=3) ──> Storage updates
State changes
| ID | Name | Purpose |
|---|---|---|
| 0 | PUB_IN |
Public input buffer (calldata, block info, msg.sender) |
| 1 | PUB_OUT |
Public output buffer (return data, logs) |
| 2 | PRV_IN |
Private input buffer (storage, account state) |
| 3 | PRV_OUT |
Private output buffer (storage updates) |
Visualizing Complete Data Flow
External World Synthesizer Circuit External World
────────────── ─────────────────── ──────────────
calldata[0]=5 ──┐
block.number ──┤──> PUB_IN (ID=0) ──> DataPt{5, source:0, wire:0} ──┐
msg.sender ──┘ │
▼
storage[key]=10 ──┐ ADD (ID=4)
├──> PRV_IN (ID=2) ──> DataPt{10, source:2, wire:0} ──┤
account.balance ──┘ │
▼
DataPt{15, source:4, wire:0}
│
returnData <───────── PUB_OUT (ID=1) <───────────────────────────────────┤
logs <──┘ │
│
storage[key]=15 <───── PRV_OUT (ID=3) <────────────────────────────────────┘