Packet Encoding Reference#

This document provides a complete specification of all packet and data structure encodings used throughout the hs_bridge software and FPGA Verilog code. Understanding these formats is essential for debugging, extending the system, or implementing compatible software.


Table of Contents#

  1. Host to FPGA Packets

  2. FPGA to Host Packets

  3. HBM Memory Structures

  4. BRAM Memory Structures

  5. PCIe Layer


Host to FPGA Packets#

These are 512-bit command packets created by fpga_compiler.py in hs_bridge and sent to the FPGA via DMA.

Command Packet Format (512 bits)#

┌─────────────────────────────────────────────────────────────┐
│                   512-bit Command Packet                    │
├───────────────┬─────────────────────────────────────────────┤
│ [511:504]     │ Opcode (8 bits)                             │
│ [503:496]     │ Core ID (8 bits)                            │
│ [495:0]       │ Payload (496 bits, opcode-specific)         │
└───────────────┴─────────────────────────────────────────────┘

Field Descriptions:

  • Opcode [511:504]: 8-bit operation type identifier

  • Core ID [503:496]: Which FPGA core to target (0-31, though typically only core 0 is used)

  • Payload [495:0]: Operation-specific data (format varies by opcode)


Opcode Definitions#

Opcode (hex)

Opcode (binary)

Name

Description

0x00

8'h00

INPUT_SPIKES

Inject external axon spikes into BRAM

0x01

8'h01

EXECUTE

Run one simulation timestep

0x02

8'h02

HBM_WRITE

Write data to HBM memory

0x03

8'h03

HBM_READ

Read data from HBM memory

0x04

8'h04

URAM_WRITE

Write neuron states to URAM

0x05

8'h05

URAM_READ

Read neuron states from URAM

0x06

8'h06

CONFIG_WRITE

Write configuration registers

0x07

8'h07

CONFIG_READ

Read configuration registers

0xC8

8'hC8

RESET

Reset FPGA state


Opcode 0x00: INPUT_SPIKES#

Injects external spike events (axon activations) into BRAM for processing.

Payload Format:

[495:480] = Axon ID (16 bits) - which axon is spiking
[479:464] = Spike time (16 bits) - future timestep (optional, usually 0 for immediate)
[463:0]   = Reserved (set to 0)

Example:

# Axon 42 fires at current timestep
opcode = 0x00
core_id = 0x00
axon_id = 42
spike_time = 0

packet = (opcode << 504) | (core_id << 496) | (axon_id << 480) | (spike_time << 464)

What happens:

  1. FPGA’s command_interpreter decodes opcode 0x00

  2. Extracts axon_id from payload

  3. Writes to BRAM address corresponding to axon_id

  4. Sets spike mask bit for this axon


Opcode 0x01: EXECUTE#

Runs one simulation timestep (processes all pending spikes, updates neurons, generates output spikes).

Payload Format:

[495:480] = Number of timesteps (16 bits) - typically 1
[479:0]   = Reserved (set to 0)

Example:

# Execute 1 timestep
opcode = 0x01
core_id = 0x00
num_timesteps = 1

packet = (opcode << 504) | (core_id << 496) | (num_timesteps << 480)

What happens:

  1. FPGA triggers execute state machine

  2. Processes all axon spikes (Phase 1: external_events_processor)

  3. Processes all neuron spikes (Phase 2: internal_events_processor)

  4. Increments execRun_ctr (timestep counter)

  5. Returns spike packets to host via output FIFO


Opcode 0x02: HBM_WRITE#

Writes data directly to HBM memory (for initializing network structure).

Payload Format:

[495:464] = HBM address (32 bits) - byte address in HBM
[463:432] = Length (32 bits) - number of bytes to write
[431:176] = Data (256 bits) - payload data (up to 32 bytes)
[175:0]   = Reserved

Example:

# Write synapse data to HBM
opcode = 0x02
core_id = 0x00
hbm_addr = 0x00008000  # Synapse region start
length = 32  # 32 bytes (8 synapses)
data = [0x00100064, 0x00110064, ...]  # Synapse entries

packet = (opcode << 504) | (core_id << 496) | (hbm_addr << 464) | (length << 432) | (data << 176)

What happens:

  1. command_interpreter routes to HBM write controller

  2. Issues AXI write transaction to HBM

  3. Writes data at specified address


Opcode 0x04: URAM_WRITE#

Writes neuron state (membrane potential) to URAM.

Payload Format:

[495:480] = Neuron ID (16 bits) - which neuron to write
[479:444] = Voltage (36 bits) - membrane potential value (signed)
[443:0]   = Reserved

Example:

# Set neuron 100 voltage to 1000
opcode = 0x04
core_id = 0x00
neuron_id = 100
voltage = 1000  # 36-bit signed value

packet = (opcode << 504) | (core_id << 496) | (neuron_id << 480) | (voltage << 444)

What happens:

  1. command_interpreter routes to URAM write controller

  2. Calculates URAM bank (neuron_id >> 13) and local address (neuron_id & 0x1FFF)

  3. Performs read-modify-write to update only target neuron (2 neurons per URAM word)

  4. Writes back updated 72-bit URAM word


Opcode 0x06: CONFIG_WRITE#

Writes to configuration registers (threshold, leak parameters, etc.).

Payload Format:

[495:480] = Register address (16 bits)
[479:416] = Value (64 bits) - configuration value
[415:0]   = Reserved

Register Map:

Address

Name

Description

0x0000

THRESHOLD

Spike threshold (36 bits)

0x0001

LEAK_ENABLE

Enable voltage leak (1 bit)

0x0002

LEAK_SHIFT

Leak divisor (shift amount)

0x0003

RESET_VOLTAGE

Voltage after spike

Example:

# Set threshold to 2000
opcode = 0x06
core_id = 0x00
reg_addr = 0x0000  # THRESHOLD register
value = 2000

packet = (opcode << 504) | (core_id << 496) | (reg_addr << 480) | (value << 416)

FPGA to Host Packets#

These are packets sent from FPGA back to the host, retrieved by fpga_controller.flush_spikes().

Spike Packet Format (512 bits)#

┌─────────────────────────────────────────────────────────────┐
│                   512-bit Spike Packet                      │
├───────────────┬─────────────────────────────────────────────┤
│ [511:496]     │ Tag = 0xEEEE (identifies as spike packet)   │
│ [495:480]     │ Spike count (16 bits) - number of valid     │
│               │ spikes in this packet (0-14)                │
│ [479:32]      │ Spike data: 14 slots × 32 bits each         │
│               │ Each slot: [31:24] = reserved               │
│               │            [23]    = valid bit              │
│               │            [22:6]  = neuron ID (17 bits)    │
│               │            [5:0]   = sub-timestep (6 bits)  │
│ [31:0]        │ Timestep (32 bits) - execRun_ctr value      │
└───────────────┴─────────────────────────────────────────────┘

Field Descriptions:

  • Tag [511:496]: Always 0xEEEE to identify this as a spike packet

  • Spike count [495:480]: Number of valid spikes in this packet (1-14)

  • Spike slots [479:32]: Up to 14 spike entries

    • Valid bit [23]: 1 = valid spike, 0 = empty slot

    • Neuron ID [22:6]: Which neuron spiked (0-131,071)

    • Sub-timestep [5:0]: Fine-grained timing within timestep (usually 0)

  • Timestep [31:0]: When these spikes occurred (execRun_ctr value)

Example Packet:

Tag: 0xEEEE
Spike count: 3
Spike 0: neuron_id=42,   valid=1, sub_ts=0
Spike 1: neuron_id=1000, valid=1, sub_ts=0
Spike 2: neuron_id=5123, valid=1, sub_ts=0
Spikes 3-13: valid=0 (empty)
Timestep: 1500

Encoded as:

[511:496] = 0xEEEE
[495:480] = 3 (spike count)
[479:448] = 0x00800150  # Spike 0: neuron 42 (0x2A)
[447:416] = 0x00803E80  # Spike 1: neuron 1000 (0x3E8)
[415:384] = 0x00814046  # Spike 2: neuron 5123 (0x1403)
[383:32]  = 0 (empty slots)
[31:0]    = 1500 (timestep)

Python Parsing:

def parse_spike_packet(packet_512bit):
    tag = (packet_512bit >> 496) & 0xFFFF
    if tag != 0xEEEE:
        return None  # Not a spike packet

    spike_count = (packet_512bit >> 480) & 0xFFFF
    timestep = packet_512bit & 0xFFFFFFFF

    spikes = []
    for i in range(14):
        spike_word = (packet_512bit >> (32 + i*32)) & 0xFFFFFFFF
        valid = (spike_word >> 23) & 0x1
        if valid:
            neuron_id = (spike_word >> 6) & 0x1FFFF
            sub_ts = spike_word & 0x3F
            spikes.append({'neuron_id': neuron_id, 'timestep': timestep, 'sub_ts': sub_ts})

    return spikes

HBM Memory Structures#

HBM stores the network structure (pointers and synapses). All addresses are byte addresses.

Memory Map#

┌──────────────────────────────────────────────────────────┐
│ HBM Memory Layout (8 GB total, 2 GB used)               │
├────────────────┬─────────────────────────────────────────┤
│ 0x00000000     │ Region 1: Axon Pointers                 │
│ - 0x00003FFF   │ Size: 16 KB (16,384 bytes)              │
│                │ Format: 32-bit pointers × 512 axons     │
├────────────────┼─────────────────────────────────────────┤
│ 0x00004000     │ Region 2: Neuron Pointers               │
│ - 0x00007FFF   │ Size: 512 KB                            │
│                │ Format: 32-bit pointers × 131,072 neurons│
├────────────────┼─────────────────────────────────────────┤
│ 0x00008000     │ Region 3: Synapses                      │
│ - 0x7FFFFFFF   │ Size: ~2 GB (variable, network-dependent)│
│                │ Format: Variable-length synapse lists   │
└────────────────┴─────────────────────────────────────────┘

Pointer Format (32 bits)#

Pointers are stored in Regions 1 and 2, mapping axon/neuron IDs to their synapse lists.

┌───────────────────────────────────────────────────────────┐
│                  32-bit Pointer Entry                     │
├────────────────┬──────────────────────────────────────────┤
│ [31:23]        │ Length (9 bits) - number of synapse rows│
│ [22:0]         │ Start Address (23 bits) - HBM row index │
│                │ (actual byte address = 0x8000 + addr×32)│
└────────────────┴──────────────────────────────────────────┘

Example:

Axon 5 pointer = 0x00201234
  Length = 0x001 (1 row = 8 synapses)
  Start address = 0x01234 (row index)
  Actual HBM address = 0x8000 + (0x1234 × 32) = 0x2A680

Python Encoding:

def encode_pointer(start_row, num_rows):
    """
    start_row: Row index in synapse region (not byte address)
    num_rows: Number of consecutive rows (each row = 8 synapses)
    """
    length = num_rows & 0x1FF  # 9 bits
    address = start_row & 0x7FFFFF  # 23 bits
    pointer = (length << 23) | address
    return pointer

def decode_pointer(pointer):
    length = (pointer >> 23) & 0x1FF
    start_row = pointer & 0x7FFFFF
    byte_address = 0x8000 + (start_row * 32)
    return {'num_rows': length, 'start_row': start_row, 'byte_address': byte_address}

Synapse Format (32 bits)#

Synapses are stored in Region 3, organized as rows of 8 synapses each (256 bits = 32 bytes per row).

┌───────────────────────────────────────────────────────────┐
│                  32-bit Synapse Entry                     │
├────────────────┬──────────────────────────────────────────┤
│ [31:29]        │ OpCode (3 bits)                          │
│                │   000 = Regular synapse                  │
│                │   100 = Output spike (send to host)      │
│                │   101 = Recurrent connection             │
│ [28:16]        │ Target Address (13 bits)                 │
│                │   For synapse: target neuron ID          │
│                │   For output: neuron to monitor          │
│ [15:0]         │ Weight (16 bits, signed fixed-point)     │
│                │   Interpretation: weight / 32768         │
└────────────────┴──────────────────────────────────────────┘

OpCode Details:

OpCode

Binary

Meaning

Target Field

Weight Field

0

3'b000

Regular synapse

Neuron ID (13 bits, 0-8191 within group)

Synaptic weight (signed 16-bit)

4

3'b100

Output spike

Neuron ID to report

Unused (set to 0)

5

3'b101

Recurrent

Global neuron ID (13 bits)

Synaptic weight

Weight Encoding:

Weights are 16-bit signed integers representing fixed-point values:

  • Range: -32,768 to +32,767

  • Interpretation: weight_value / 32768.0

  • Examples:

    • 0x7FFF (32767) → +0.9999… ≈ +1.0

    • 0x4000 (16384) → +0.5

    • 0x0400 (1024) → +0.03125

    • 0x0000 (0) → 0.0

    • 0xFC00 (-1024) → -0.03125

    • 0x8000 (-32768) → -1.0

Example Synapses:

# Regular synapse: target neuron 42, weight +1000 (≈0.0305)
synapse_1 = (0b000 << 29) | (42 << 16) | 1000
# = 0x002A03E8

# Output spike: report neuron 100
synapse_2 = (0b100 << 29) | (100 << 16) | 0
# = 0x80640000

# Negative weight synapse: target neuron 10, weight -500 (inhibitory)
synapse_3 = (0b000 << 29) | (10 << 16) | ((-500) & 0xFFFF)
# = 0x000AFE0C

Python Encoding:

def encode_synapse(opcode, target, weight):
    """
    opcode: 0=regular, 4=output, 5=recurrent
    target: neuron ID (0-8191 for regular, 0-131071 for global)
    weight: signed integer (-32768 to 32767)
    """
    opcode_bits = (opcode & 0x7) << 29
    target_bits = (target & 0x1FFF) << 16
    weight_bits = weight & 0xFFFF
    synapse = opcode_bits | target_bits | weight_bits
    return synapse

def decode_synapse(synapse):
    opcode = (synapse >> 29) & 0x7
    target = (synapse >> 16) & 0x1FFF
    weight = synapse & 0xFFFF
    # Sign extend weight if necessary
    if weight & 0x8000:  # Negative
        weight = weight - 65536
    return {'opcode': opcode, 'target': target, 'weight': weight}

Synapse Row (256 bits = 8 synapses):

Row at HBM address 0x8000:
  [255:224] = Synapse 7
  [223:192] = Synapse 6
  [191:160] = Synapse 5
  [159:128] = Synapse 4
  [127:96]  = Synapse 3
  [95:64]   = Synapse 2
  [63:32]   = Synapse 1
  [31:0]    = Synapse 0

BRAM Memory Structures#

BRAM stores spike masks for external events (axon spikes).

BRAM Organization#

┌──────────────────────────────────────────────────────────┐
│ BRAM: 32,768 rows × 256 bits per row = 1 MB             │
├────────────────┬─────────────────────────────────────────┤
│ Address        │ Content                                 │
├────────────────┼─────────────────────────────────────────┤
│ 0x0000         │ Axon/Event 0 spike mask                 │
│ 0x0001         │ Axon/Event 1 spike mask                 │
│ ...            │ ...                                     │
│ 0x7FFF         │ Axon/Event 32,767 spike mask            │
└────────────────┴─────────────────────────────────────────┘

Spike Mask Format (256 bits)#

Each row contains a 256-bit bitmask indicating which neuron groups should receive this spike.

┌───────────────────────────────────────────────────────────┐
│              256-bit Spike Mask (one BRAM row)            │
├────────────────┬──────────────────────────────────────────┤
│ [255:240]      │ Group 15 mask (16 bits)                  │
│ [239:224]      │ Group 14 mask (16 bits)                  │
│ ...            │ ...                                      │
│ [31:16]        │ Group 1 mask (16 bits)                   │
│ [15:0]         │ Group 0 mask (16 bits)                   │
└────────────────┴──────────────────────────────────────────┘

Each 16-bit group mask:

  • Bit 0: First neuron in group should receive spike

  • Bit 1: Second neuron in group should receive spike

  • Bit 15: 16th neuron in group should receive spike

Note: This is a coarse-grained mask. For fine-grained connectivity, the spike is processed further:

  1. BRAM mask identifies which groups get the spike

  2. For each group, HBM is read to get the full synapse list

  3. Synapse list specifies exact target neurons and weights

Example:

Axon 5 fires, BRAM row 5 contains:
  Group 0 mask: 0x000F (neurons 0-3 in group 0)
  Group 1 mask: 0x0000 (no neurons in group 1)
  Group 2 mask: 0x8000 (neuron 15 in group 2)
  Groups 3-15: 0x0000

This means axon 5 spike should be delivered to:
  - Neurons 0, 1, 2, 3 in group 0
  - Neuron 15 in group 2

Python Encoding:

def encode_bram_mask(group_masks):
    """
    group_masks: list of 16 integers (16-bit masks for each group)
    Returns: 256-bit value
    """
    mask = 0
    for i, group_mask in enumerate(group_masks):
        mask |= (group_mask & 0xFFFF) << (i * 16)
    return mask

def decode_bram_mask(mask_256bit):
    """
    mask_256bit: 256-bit value
    Returns: list of 16 group masks
    """
    group_masks = []
    for i in range(16):
        group_mask = (mask_256bit >> (i * 16)) & 0xFFFF
        group_masks.append(group_mask)
    return group_masks

PCIe Layer#

All communication between host and FPGA travels over PCIe using Transaction Layer Packets (TLPs).

PCIe TLP Format#

hs_bridge and the FPGA do NOT directly create PCIe TLPs - the PCIe hardware handles this automatically. However, understanding the format is useful for debugging.

Memory Write TLP (Host → FPGA MMIO):

┌─────────────────────────────────────────────────────────────┐
│                   PCIe Memory Write TLP                     │
├────────────────────┬────────────────────────────────────────┤
│ Header (3-4 DWords)│                                        │
│ [127:125]          │ Fmt = 010 (write with data, 32-bit addr)│
│ [124:120]          │ Type = 00000 (memory write)            │
│ [95:64]            │ Address (32 bits) - FPGA MMIO address  │
│ [9:0]              │ Length (10 bits) - DWords to transfer  │
├────────────────────┼────────────────────────────────────────┤
│ Data (N DWords)    │ Payload data (up to 4096 bytes)        │
└────────────────────┴────────────────────────────────────────┘

Memory Read TLP (FPGA → Host Memory via DMA):

┌─────────────────────────────────────────────────────────────┐
│                   PCIe Memory Read TLP                      │
├────────────────────┬────────────────────────────────────────┤
│ Header (4 DWords)  │                                        │
│ [127:125]          │ Fmt = 001 (read request, 64-bit addr)  │
│ [124:120]          │ Type = 00000 (memory read)             │
│ [95:0]             │ Address (64 bits) - host DDR4 address  │
│ [9:0]              │ Length (10 bits) - DWords requested    │
└────────────────────┴────────────────────────────────────────┘

Completion TLP (Host → FPGA, returning DMA data):

┌─────────────────────────────────────────────────────────────┐
│                   PCIe Completion TLP                       │
├────────────────────┬────────────────────────────────────────┤
│ Header (3 DWords)  │                                        │
│ [127:125]          │ Fmt = 010 (completion with data)       │
│ [124:120]          │ Type = 01010 (completion)              │
│ [9:0]              │ Byte count (10 bits)                   │
├────────────────────┼────────────────────────────────────────┤
│ Data (N DWords)    │ Requested data from host memory        │
└────────────────────┴────────────────────────────────────────┘

Key Points:

  • DWord: 32-bit (4-byte) word

  • Addressing: Can be 32-bit or 64-bit depending on format

  • Maximum payload: 4096 bytes (4 KB) per TLP

  • Ordering: Memory writes are posted (no response), reads require completions

hs_bridge’s Role:

  • hs_bridge does NOT create TLPs directly

  • When hs_bridge writes to an MMIO address, the OS kernel driver and PCIe hardware create the TLP

  • When FPGA does DMA, the FPGA’s PCIe hard block creates Memory Read TLPs automatically


Summary: Packet Flow#

Host to FPGA Flow:#

1. Python (hs_bridge):
   packet = create_512bit_command(opcode=0x01, ...)

2. Write to system memory (DDR4):
   dma_buffer[0] = packet

3. Tell FPGA via MMIO (creates PCIe Memory Write TLP):
   fpga.write_register(DMA_ADDR_REG, physical_address)

4. FPGA reads via DMA (creates PCIe Memory Read TLP):
   FPGA → PCIe: "Send me data from address X"

5. Host responds (PCIe Completion TLP):
   Host → FPGA: "Here's the 512-bit packet"

6. FPGA decodes:
   Extracts opcode, routes to appropriate module

FPGA to Host Flow:#

1. Neuron spikes:
   URAM threshold check → spike detected

2. Spike collection:
   Spike FIFO gathers spikes from all neuron groups

3. Packet assembly:
   spike_fifo_controller creates 512-bit spike packet

4. Write to output FIFO:
   Buffered in FPGA FIFO

5. DMA to host (creates PCIe Memory Write TLP):
   FPGA → Host memory: Write spike packet to DMA buffer

6. Host retrieves:
   fpga_controller.flush_spikes() reads from DMA buffer

Quick Reference Tables#

Command Opcodes#

Code

Name

Payload

0x00

INPUT_SPIKES

[495:480]=axon_id

0x01

EXECUTE

[495:480]=num_timesteps

0x02

HBM_WRITE

[495:464]=addr, [463:432]=len, [431:176]=data

0x04

URAM_WRITE

[495:480]=neuron_id, [479:444]=voltage

0x06

CONFIG_WRITE

[495:480]=reg_addr, [479:416]=value

Synapse OpCodes#

Code

Binary

Meaning

0

000

Regular synapse

4

100

Output spike (send to host)

5

101

Recurrent connection

Memory Regions#

Region

Base Address

Size

Contents

Axon Ptrs

0x00000000

16 KB

Axon → synapse pointers

Neuron Ptrs

0x00004000

512 KB

Neuron → synapse pointers

Synapses

0x00008000

~2 GB

Synapse lists


This reference should provide all the information needed to encode/decode packets and data structures used throughout the hs_bridge and FPGA implementation.