Packet Encoding Reference#
This document provides a complete specification of all packet and data structure encodings used throughout the hs_bridge software and FPGA Verilog code. Understanding these formats is essential for debugging, extending the system, or implementing compatible software.
Table of Contents#
Host to FPGA Packets#
These are 512-bit command packets created by fpga_compiler.py in hs_bridge and sent to the FPGA via DMA.
Command Packet Format (512 bits)#
┌─────────────────────────────────────────────────────────────┐
│ 512-bit Command Packet │
├───────────────┬─────────────────────────────────────────────┤
│ [511:504] │ Opcode (8 bits) │
│ [503:496] │ Core ID (8 bits) │
│ [495:0] │ Payload (496 bits, opcode-specific) │
└───────────────┴─────────────────────────────────────────────┘
Field Descriptions:
Opcode [511:504]: 8-bit operation type identifier
Core ID [503:496]: Which FPGA core to target (0-31, though typically only core 0 is used)
Payload [495:0]: Operation-specific data (format varies by opcode)
Opcode Definitions#
Opcode (hex) |
Opcode (binary) |
Name |
Description |
|---|---|---|---|
|
|
INPUT_SPIKES |
Inject external axon spikes into BRAM |
|
|
EXECUTE |
Run one simulation timestep |
|
|
HBM_WRITE |
Write data to HBM memory |
|
|
HBM_READ |
Read data from HBM memory |
|
|
URAM_WRITE |
Write neuron states to URAM |
|
|
URAM_READ |
Read neuron states from URAM |
|
|
CONFIG_WRITE |
Write configuration registers |
|
|
CONFIG_READ |
Read configuration registers |
|
|
RESET |
Reset FPGA state |
Opcode 0x00: INPUT_SPIKES#
Injects external spike events (axon activations) into BRAM for processing.
Payload Format:
[495:480] = Axon ID (16 bits) - which axon is spiking
[479:464] = Spike time (16 bits) - future timestep (optional, usually 0 for immediate)
[463:0] = Reserved (set to 0)
Example:
# Axon 42 fires at current timestep
opcode = 0x00
core_id = 0x00
axon_id = 42
spike_time = 0
packet = (opcode << 504) | (core_id << 496) | (axon_id << 480) | (spike_time << 464)
What happens:
FPGA’s command_interpreter decodes opcode 0x00
Extracts axon_id from payload
Writes to BRAM address corresponding to axon_id
Sets spike mask bit for this axon
Opcode 0x01: EXECUTE#
Runs one simulation timestep (processes all pending spikes, updates neurons, generates output spikes).
Payload Format:
[495:480] = Number of timesteps (16 bits) - typically 1
[479:0] = Reserved (set to 0)
Example:
# Execute 1 timestep
opcode = 0x01
core_id = 0x00
num_timesteps = 1
packet = (opcode << 504) | (core_id << 496) | (num_timesteps << 480)
What happens:
FPGA triggers execute state machine
Processes all axon spikes (Phase 1: external_events_processor)
Processes all neuron spikes (Phase 2: internal_events_processor)
Increments execRun_ctr (timestep counter)
Returns spike packets to host via output FIFO
Opcode 0x02: HBM_WRITE#
Writes data directly to HBM memory (for initializing network structure).
Payload Format:
[495:464] = HBM address (32 bits) - byte address in HBM
[463:432] = Length (32 bits) - number of bytes to write
[431:176] = Data (256 bits) - payload data (up to 32 bytes)
[175:0] = Reserved
Example:
# Write synapse data to HBM
opcode = 0x02
core_id = 0x00
hbm_addr = 0x00008000 # Synapse region start
length = 32 # 32 bytes (8 synapses)
data = [0x00100064, 0x00110064, ...] # Synapse entries
packet = (opcode << 504) | (core_id << 496) | (hbm_addr << 464) | (length << 432) | (data << 176)
What happens:
command_interpreter routes to HBM write controller
Issues AXI write transaction to HBM
Writes data at specified address
Opcode 0x04: URAM_WRITE#
Writes neuron state (membrane potential) to URAM.
Payload Format:
[495:480] = Neuron ID (16 bits) - which neuron to write
[479:444] = Voltage (36 bits) - membrane potential value (signed)
[443:0] = Reserved
Example:
# Set neuron 100 voltage to 1000
opcode = 0x04
core_id = 0x00
neuron_id = 100
voltage = 1000 # 36-bit signed value
packet = (opcode << 504) | (core_id << 496) | (neuron_id << 480) | (voltage << 444)
What happens:
command_interpreter routes to URAM write controller
Calculates URAM bank (neuron_id >> 13) and local address (neuron_id & 0x1FFF)
Performs read-modify-write to update only target neuron (2 neurons per URAM word)
Writes back updated 72-bit URAM word
Opcode 0x06: CONFIG_WRITE#
Writes to configuration registers (threshold, leak parameters, etc.).
Payload Format:
[495:480] = Register address (16 bits)
[479:416] = Value (64 bits) - configuration value
[415:0] = Reserved
Register Map:
Address |
Name |
Description |
|---|---|---|
|
THRESHOLD |
Spike threshold (36 bits) |
|
LEAK_ENABLE |
Enable voltage leak (1 bit) |
|
LEAK_SHIFT |
Leak divisor (shift amount) |
|
RESET_VOLTAGE |
Voltage after spike |
Example:
# Set threshold to 2000
opcode = 0x06
core_id = 0x00
reg_addr = 0x0000 # THRESHOLD register
value = 2000
packet = (opcode << 504) | (core_id << 496) | (reg_addr << 480) | (value << 416)
FPGA to Host Packets#
These are packets sent from FPGA back to the host, retrieved by fpga_controller.flush_spikes().
Spike Packet Format (512 bits)#
┌─────────────────────────────────────────────────────────────┐
│ 512-bit Spike Packet │
├───────────────┬─────────────────────────────────────────────┤
│ [511:496] │ Tag = 0xEEEE (identifies as spike packet) │
│ [495:480] │ Spike count (16 bits) - number of valid │
│ │ spikes in this packet (0-14) │
│ [479:32] │ Spike data: 14 slots × 32 bits each │
│ │ Each slot: [31:24] = reserved │
│ │ [23] = valid bit │
│ │ [22:6] = neuron ID (17 bits) │
│ │ [5:0] = sub-timestep (6 bits) │
│ [31:0] │ Timestep (32 bits) - execRun_ctr value │
└───────────────┴─────────────────────────────────────────────┘
Field Descriptions:
Tag [511:496]: Always
0xEEEEto identify this as a spike packetSpike count [495:480]: Number of valid spikes in this packet (1-14)
Spike slots [479:32]: Up to 14 spike entries
Valid bit [23]: 1 = valid spike, 0 = empty slot
Neuron ID [22:6]: Which neuron spiked (0-131,071)
Sub-timestep [5:0]: Fine-grained timing within timestep (usually 0)
Timestep [31:0]: When these spikes occurred (execRun_ctr value)
Example Packet:
Tag: 0xEEEE
Spike count: 3
Spike 0: neuron_id=42, valid=1, sub_ts=0
Spike 1: neuron_id=1000, valid=1, sub_ts=0
Spike 2: neuron_id=5123, valid=1, sub_ts=0
Spikes 3-13: valid=0 (empty)
Timestep: 1500
Encoded as:
[511:496] = 0xEEEE
[495:480] = 3 (spike count)
[479:448] = 0x00800150 # Spike 0: neuron 42 (0x2A)
[447:416] = 0x00803E80 # Spike 1: neuron 1000 (0x3E8)
[415:384] = 0x00814046 # Spike 2: neuron 5123 (0x1403)
[383:32] = 0 (empty slots)
[31:0] = 1500 (timestep)
Python Parsing:
def parse_spike_packet(packet_512bit):
tag = (packet_512bit >> 496) & 0xFFFF
if tag != 0xEEEE:
return None # Not a spike packet
spike_count = (packet_512bit >> 480) & 0xFFFF
timestep = packet_512bit & 0xFFFFFFFF
spikes = []
for i in range(14):
spike_word = (packet_512bit >> (32 + i*32)) & 0xFFFFFFFF
valid = (spike_word >> 23) & 0x1
if valid:
neuron_id = (spike_word >> 6) & 0x1FFFF
sub_ts = spike_word & 0x3F
spikes.append({'neuron_id': neuron_id, 'timestep': timestep, 'sub_ts': sub_ts})
return spikes
HBM Memory Structures#
HBM stores the network structure (pointers and synapses). All addresses are byte addresses.
Memory Map#
┌──────────────────────────────────────────────────────────┐
│ HBM Memory Layout (8 GB total, 2 GB used) │
├────────────────┬─────────────────────────────────────────┤
│ 0x00000000 │ Region 1: Axon Pointers │
│ - 0x00003FFF │ Size: 16 KB (16,384 bytes) │
│ │ Format: 32-bit pointers × 512 axons │
├────────────────┼─────────────────────────────────────────┤
│ 0x00004000 │ Region 2: Neuron Pointers │
│ - 0x00007FFF │ Size: 512 KB │
│ │ Format: 32-bit pointers × 131,072 neurons│
├────────────────┼─────────────────────────────────────────┤
│ 0x00008000 │ Region 3: Synapses │
│ - 0x7FFFFFFF │ Size: ~2 GB (variable, network-dependent)│
│ │ Format: Variable-length synapse lists │
└────────────────┴─────────────────────────────────────────┘
Pointer Format (32 bits)#
Pointers are stored in Regions 1 and 2, mapping axon/neuron IDs to their synapse lists.
┌───────────────────────────────────────────────────────────┐
│ 32-bit Pointer Entry │
├────────────────┬──────────────────────────────────────────┤
│ [31:23] │ Length (9 bits) - number of synapse rows│
│ [22:0] │ Start Address (23 bits) - HBM row index │
│ │ (actual byte address = 0x8000 + addr×32)│
└────────────────┴──────────────────────────────────────────┘
Example:
Axon 5 pointer = 0x00201234
Length = 0x001 (1 row = 8 synapses)
Start address = 0x01234 (row index)
Actual HBM address = 0x8000 + (0x1234 × 32) = 0x2A680
Python Encoding:
def encode_pointer(start_row, num_rows):
"""
start_row: Row index in synapse region (not byte address)
num_rows: Number of consecutive rows (each row = 8 synapses)
"""
length = num_rows & 0x1FF # 9 bits
address = start_row & 0x7FFFFF # 23 bits
pointer = (length << 23) | address
return pointer
def decode_pointer(pointer):
length = (pointer >> 23) & 0x1FF
start_row = pointer & 0x7FFFFF
byte_address = 0x8000 + (start_row * 32)
return {'num_rows': length, 'start_row': start_row, 'byte_address': byte_address}
Synapse Format (32 bits)#
Synapses are stored in Region 3, organized as rows of 8 synapses each (256 bits = 32 bytes per row).
┌───────────────────────────────────────────────────────────┐
│ 32-bit Synapse Entry │
├────────────────┬──────────────────────────────────────────┤
│ [31:29] │ OpCode (3 bits) │
│ │ 000 = Regular synapse │
│ │ 100 = Output spike (send to host) │
│ │ 101 = Recurrent connection │
│ [28:16] │ Target Address (13 bits) │
│ │ For synapse: target neuron ID │
│ │ For output: neuron to monitor │
│ [15:0] │ Weight (16 bits, signed fixed-point) │
│ │ Interpretation: weight / 32768 │
└────────────────┴──────────────────────────────────────────┘
OpCode Details:
OpCode |
Binary |
Meaning |
Target Field |
Weight Field |
|---|---|---|---|---|
0 |
|
Regular synapse |
Neuron ID (13 bits, 0-8191 within group) |
Synaptic weight (signed 16-bit) |
4 |
|
Output spike |
Neuron ID to report |
Unused (set to 0) |
5 |
|
Recurrent |
Global neuron ID (13 bits) |
Synaptic weight |
Weight Encoding:
Weights are 16-bit signed integers representing fixed-point values:
Range: -32,768 to +32,767
Interpretation:
weight_value / 32768.0Examples:
0x7FFF(32767) → +0.9999… ≈ +1.00x4000(16384) → +0.50x0400(1024) → +0.031250x0000(0) → 0.00xFC00(-1024) → -0.031250x8000(-32768) → -1.0
Example Synapses:
# Regular synapse: target neuron 42, weight +1000 (≈0.0305)
synapse_1 = (0b000 << 29) | (42 << 16) | 1000
# = 0x002A03E8
# Output spike: report neuron 100
synapse_2 = (0b100 << 29) | (100 << 16) | 0
# = 0x80640000
# Negative weight synapse: target neuron 10, weight -500 (inhibitory)
synapse_3 = (0b000 << 29) | (10 << 16) | ((-500) & 0xFFFF)
# = 0x000AFE0C
Python Encoding:
def encode_synapse(opcode, target, weight):
"""
opcode: 0=regular, 4=output, 5=recurrent
target: neuron ID (0-8191 for regular, 0-131071 for global)
weight: signed integer (-32768 to 32767)
"""
opcode_bits = (opcode & 0x7) << 29
target_bits = (target & 0x1FFF) << 16
weight_bits = weight & 0xFFFF
synapse = opcode_bits | target_bits | weight_bits
return synapse
def decode_synapse(synapse):
opcode = (synapse >> 29) & 0x7
target = (synapse >> 16) & 0x1FFF
weight = synapse & 0xFFFF
# Sign extend weight if necessary
if weight & 0x8000: # Negative
weight = weight - 65536
return {'opcode': opcode, 'target': target, 'weight': weight}
Synapse Row (256 bits = 8 synapses):
Row at HBM address 0x8000:
[255:224] = Synapse 7
[223:192] = Synapse 6
[191:160] = Synapse 5
[159:128] = Synapse 4
[127:96] = Synapse 3
[95:64] = Synapse 2
[63:32] = Synapse 1
[31:0] = Synapse 0
BRAM Memory Structures#
BRAM stores spike masks for external events (axon spikes).
BRAM Organization#
┌──────────────────────────────────────────────────────────┐
│ BRAM: 32,768 rows × 256 bits per row = 1 MB │
├────────────────┬─────────────────────────────────────────┤
│ Address │ Content │
├────────────────┼─────────────────────────────────────────┤
│ 0x0000 │ Axon/Event 0 spike mask │
│ 0x0001 │ Axon/Event 1 spike mask │
│ ... │ ... │
│ 0x7FFF │ Axon/Event 32,767 spike mask │
└────────────────┴─────────────────────────────────────────┘
Spike Mask Format (256 bits)#
Each row contains a 256-bit bitmask indicating which neuron groups should receive this spike.
┌───────────────────────────────────────────────────────────┐
│ 256-bit Spike Mask (one BRAM row) │
├────────────────┬──────────────────────────────────────────┤
│ [255:240] │ Group 15 mask (16 bits) │
│ [239:224] │ Group 14 mask (16 bits) │
│ ... │ ... │
│ [31:16] │ Group 1 mask (16 bits) │
│ [15:0] │ Group 0 mask (16 bits) │
└────────────────┴──────────────────────────────────────────┘
Each 16-bit group mask:
Bit 0: First neuron in group should receive spike
Bit 1: Second neuron in group should receive spike
…
Bit 15: 16th neuron in group should receive spike
Note: This is a coarse-grained mask. For fine-grained connectivity, the spike is processed further:
BRAM mask identifies which groups get the spike
For each group, HBM is read to get the full synapse list
Synapse list specifies exact target neurons and weights
Example:
Axon 5 fires, BRAM row 5 contains:
Group 0 mask: 0x000F (neurons 0-3 in group 0)
Group 1 mask: 0x0000 (no neurons in group 1)
Group 2 mask: 0x8000 (neuron 15 in group 2)
Groups 3-15: 0x0000
This means axon 5 spike should be delivered to:
- Neurons 0, 1, 2, 3 in group 0
- Neuron 15 in group 2
Python Encoding:
def encode_bram_mask(group_masks):
"""
group_masks: list of 16 integers (16-bit masks for each group)
Returns: 256-bit value
"""
mask = 0
for i, group_mask in enumerate(group_masks):
mask |= (group_mask & 0xFFFF) << (i * 16)
return mask
def decode_bram_mask(mask_256bit):
"""
mask_256bit: 256-bit value
Returns: list of 16 group masks
"""
group_masks = []
for i in range(16):
group_mask = (mask_256bit >> (i * 16)) & 0xFFFF
group_masks.append(group_mask)
return group_masks
PCIe Layer#
All communication between host and FPGA travels over PCIe using Transaction Layer Packets (TLPs).
PCIe TLP Format#
hs_bridge and the FPGA do NOT directly create PCIe TLPs - the PCIe hardware handles this automatically. However, understanding the format is useful for debugging.
Memory Write TLP (Host → FPGA MMIO):
┌─────────────────────────────────────────────────────────────┐
│ PCIe Memory Write TLP │
├────────────────────┬────────────────────────────────────────┤
│ Header (3-4 DWords)│ │
│ [127:125] │ Fmt = 010 (write with data, 32-bit addr)│
│ [124:120] │ Type = 00000 (memory write) │
│ [95:64] │ Address (32 bits) - FPGA MMIO address │
│ [9:0] │ Length (10 bits) - DWords to transfer │
├────────────────────┼────────────────────────────────────────┤
│ Data (N DWords) │ Payload data (up to 4096 bytes) │
└────────────────────┴────────────────────────────────────────┘
Memory Read TLP (FPGA → Host Memory via DMA):
┌─────────────────────────────────────────────────────────────┐
│ PCIe Memory Read TLP │
├────────────────────┬────────────────────────────────────────┤
│ Header (4 DWords) │ │
│ [127:125] │ Fmt = 001 (read request, 64-bit addr) │
│ [124:120] │ Type = 00000 (memory read) │
│ [95:0] │ Address (64 bits) - host DDR4 address │
│ [9:0] │ Length (10 bits) - DWords requested │
└────────────────────┴────────────────────────────────────────┘
Completion TLP (Host → FPGA, returning DMA data):
┌─────────────────────────────────────────────────────────────┐
│ PCIe Completion TLP │
├────────────────────┬────────────────────────────────────────┤
│ Header (3 DWords) │ │
│ [127:125] │ Fmt = 010 (completion with data) │
│ [124:120] │ Type = 01010 (completion) │
│ [9:0] │ Byte count (10 bits) │
├────────────────────┼────────────────────────────────────────┤
│ Data (N DWords) │ Requested data from host memory │
└────────────────────┴────────────────────────────────────────┘
Key Points:
DWord: 32-bit (4-byte) word
Addressing: Can be 32-bit or 64-bit depending on format
Maximum payload: 4096 bytes (4 KB) per TLP
Ordering: Memory writes are posted (no response), reads require completions
hs_bridge’s Role:
hs_bridge does NOT create TLPs directly
When hs_bridge writes to an MMIO address, the OS kernel driver and PCIe hardware create the TLP
When FPGA does DMA, the FPGA’s PCIe hard block creates Memory Read TLPs automatically
Summary: Packet Flow#
Host to FPGA Flow:#
1. Python (hs_bridge):
packet = create_512bit_command(opcode=0x01, ...)
2. Write to system memory (DDR4):
dma_buffer[0] = packet
3. Tell FPGA via MMIO (creates PCIe Memory Write TLP):
fpga.write_register(DMA_ADDR_REG, physical_address)
4. FPGA reads via DMA (creates PCIe Memory Read TLP):
FPGA → PCIe: "Send me data from address X"
5. Host responds (PCIe Completion TLP):
Host → FPGA: "Here's the 512-bit packet"
6. FPGA decodes:
Extracts opcode, routes to appropriate module
FPGA to Host Flow:#
1. Neuron spikes:
URAM threshold check → spike detected
2. Spike collection:
Spike FIFO gathers spikes from all neuron groups
3. Packet assembly:
spike_fifo_controller creates 512-bit spike packet
4. Write to output FIFO:
Buffered in FPGA FIFO
5. DMA to host (creates PCIe Memory Write TLP):
FPGA → Host memory: Write spike packet to DMA buffer
6. Host retrieves:
fpga_controller.flush_spikes() reads from DMA buffer
Quick Reference Tables#
Command Opcodes#
Code |
Name |
Payload |
|---|---|---|
0x00 |
INPUT_SPIKES |
|
0x01 |
EXECUTE |
|
0x02 |
HBM_WRITE |
|
0x04 |
URAM_WRITE |
|
0x06 |
CONFIG_WRITE |
|
Synapse OpCodes#
Code |
Binary |
Meaning |
|---|---|---|
0 |
|
Regular synapse |
4 |
|
Output spike (send to host) |
5 |
|
Recurrent connection |
Memory Regions#
Region |
Base Address |
Size |
Contents |
|---|---|---|---|
Axon Ptrs |
0x00000000 |
16 KB |
Axon → synapse pointers |
Neuron Ptrs |
0x00004000 |
512 KB |
Neuron → synapse pointers |
Synapses |
0x00008000 |
~2 GB |
Synapse lists |
This reference should provide all the information needed to encode/decode packets and data structures used throughout the hs_bridge and FPGA implementation.