# Packet Encoding Reference This document provides a complete specification of all packet and data structure encodings used throughout the hs_bridge software and FPGA Verilog code. Understanding these formats is essential for debugging, extending the system, or implementing compatible software. --- ## Table of Contents 1. [Host to FPGA Packets](#host-to-fpga-packets) 2. [FPGA to Host Packets](#fpga-to-host-packets) 3. [HBM Memory Structures](#hbm-memory-structures) 4. [BRAM Memory Structures](#bram-memory-structures) 5. [PCIe Layer](#pcie-layer) --- ## Host to FPGA Packets These are 512-bit command packets created by `fpga_compiler.py` in hs_bridge and sent to the FPGA via DMA. ### **Command Packet Format (512 bits)** ``` ┌─────────────────────────────────────────────────────────────┐ │ 512-bit Command Packet │ ├───────────────┬─────────────────────────────────────────────┤ │ [511:504] │ Opcode (8 bits) │ │ [503:496] │ Core ID (8 bits) │ │ [495:0] │ Payload (496 bits, opcode-specific) │ └───────────────┴─────────────────────────────────────────────┘ ``` **Field Descriptions:** - **Opcode [511:504]:** 8-bit operation type identifier - **Core ID [503:496]:** Which FPGA core to target (0-31, though typically only core 0 is used) - **Payload [495:0]:** Operation-specific data (format varies by opcode) --- ### **Opcode Definitions** | Opcode (hex) | Opcode (binary) | Name | Description | |--------------|-----------------|------|-------------| | `0x00` | `8'h00` | INPUT_SPIKES | Inject external axon spikes into BRAM | | `0x01` | `8'h01` | EXECUTE | Run one simulation timestep | | `0x02` | `8'h02` | HBM_WRITE | Write data to HBM memory | | `0x03` | `8'h03` | HBM_READ | Read data from HBM memory | | `0x04` | `8'h04` | URAM_WRITE | Write neuron states to URAM | | `0x05` | `8'h05` | URAM_READ | Read neuron states from URAM | | `0x06` | `8'h06` | CONFIG_WRITE | Write configuration registers | | `0x07` | `8'h07` | CONFIG_READ | Read configuration registers | | `0xC8` | `8'hC8` | RESET | Reset FPGA state | --- ### **Opcode 0x00: INPUT_SPIKES** Injects external spike events (axon activations) into BRAM for processing. **Payload Format:** ``` [495:480] = Axon ID (16 bits) - which axon is spiking [479:464] = Spike time (16 bits) - future timestep (optional, usually 0 for immediate) [463:0] = Reserved (set to 0) ``` **Example:** ```python # Axon 42 fires at current timestep opcode = 0x00 core_id = 0x00 axon_id = 42 spike_time = 0 packet = (opcode << 504) | (core_id << 496) | (axon_id << 480) | (spike_time << 464) ``` **What happens:** 1. FPGA's command_interpreter decodes opcode 0x00 2. Extracts axon_id from payload 3. Writes to BRAM address corresponding to axon_id 4. Sets spike mask bit for this axon --- ### **Opcode 0x01: EXECUTE** Runs one simulation timestep (processes all pending spikes, updates neurons, generates output spikes). **Payload Format:** ``` [495:480] = Number of timesteps (16 bits) - typically 1 [479:0] = Reserved (set to 0) ``` **Example:** ```python # Execute 1 timestep opcode = 0x01 core_id = 0x00 num_timesteps = 1 packet = (opcode << 504) | (core_id << 496) | (num_timesteps << 480) ``` **What happens:** 1. FPGA triggers execute state machine 2. Processes all axon spikes (Phase 1: external_events_processor) 3. Processes all neuron spikes (Phase 2: internal_events_processor) 4. Increments execRun_ctr (timestep counter) 5. Returns spike packets to host via output FIFO --- ### **Opcode 0x02: HBM_WRITE** Writes data directly to HBM memory (for initializing network structure). **Payload Format:** ``` [495:464] = HBM address (32 bits) - byte address in HBM [463:432] = Length (32 bits) - number of bytes to write [431:176] = Data (256 bits) - payload data (up to 32 bytes) [175:0] = Reserved ``` **Example:** ```python # Write synapse data to HBM opcode = 0x02 core_id = 0x00 hbm_addr = 0x00008000 # Synapse region start length = 32 # 32 bytes (8 synapses) data = [0x00100064, 0x00110064, ...] # Synapse entries packet = (opcode << 504) | (core_id << 496) | (hbm_addr << 464) | (length << 432) | (data << 176) ``` **What happens:** 1. command_interpreter routes to HBM write controller 2. Issues AXI write transaction to HBM 3. Writes data at specified address --- ### **Opcode 0x04: URAM_WRITE** Writes neuron state (membrane potential) to URAM. **Payload Format:** ``` [495:480] = Neuron ID (16 bits) - which neuron to write [479:444] = Voltage (36 bits) - membrane potential value (signed) [443:0] = Reserved ``` **Example:** ```python # Set neuron 100 voltage to 1000 opcode = 0x04 core_id = 0x00 neuron_id = 100 voltage = 1000 # 36-bit signed value packet = (opcode << 504) | (core_id << 496) | (neuron_id << 480) | (voltage << 444) ``` **What happens:** 1. command_interpreter routes to URAM write controller 2. Calculates URAM bank (neuron_id >> 13) and local address (neuron_id & 0x1FFF) 3. Performs read-modify-write to update only target neuron (2 neurons per URAM word) 4. Writes back updated 72-bit URAM word --- ### **Opcode 0x06: CONFIG_WRITE** Writes to configuration registers (threshold, leak parameters, etc.). **Payload Format:** ``` [495:480] = Register address (16 bits) [479:416] = Value (64 bits) - configuration value [415:0] = Reserved ``` **Register Map:** | Address | Name | Description | |---------|------|-------------| | `0x0000` | THRESHOLD | Spike threshold (36 bits) | | `0x0001` | LEAK_ENABLE | Enable voltage leak (1 bit) | | `0x0002` | LEAK_SHIFT | Leak divisor (shift amount) | | `0x0003` | RESET_VOLTAGE | Voltage after spike | **Example:** ```python # Set threshold to 2000 opcode = 0x06 core_id = 0x00 reg_addr = 0x0000 # THRESHOLD register value = 2000 packet = (opcode << 504) | (core_id << 496) | (reg_addr << 480) | (value << 416) ``` --- ## FPGA to Host Packets These are packets sent from FPGA back to the host, retrieved by `fpga_controller.flush_spikes()`. ### **Spike Packet Format (512 bits)** ``` ┌─────────────────────────────────────────────────────────────┐ │ 512-bit Spike Packet │ ├───────────────┬─────────────────────────────────────────────┤ │ [511:496] │ Tag = 0xEEEE (identifies as spike packet) │ │ [495:480] │ Spike count (16 bits) - number of valid │ │ │ spikes in this packet (0-14) │ │ [479:32] │ Spike data: 14 slots × 32 bits each │ │ │ Each slot: [31:24] = reserved │ │ │ [23] = valid bit │ │ │ [22:6] = neuron ID (17 bits) │ │ │ [5:0] = sub-timestep (6 bits) │ │ [31:0] │ Timestep (32 bits) - execRun_ctr value │ └───────────────┴─────────────────────────────────────────────┘ ``` **Field Descriptions:** - **Tag [511:496]:** Always `0xEEEE` to identify this as a spike packet - **Spike count [495:480]:** Number of valid spikes in this packet (1-14) - **Spike slots [479:32]:** Up to 14 spike entries - **Valid bit [23]:** 1 = valid spike, 0 = empty slot - **Neuron ID [22:6]:** Which neuron spiked (0-131,071) - **Sub-timestep [5:0]:** Fine-grained timing within timestep (usually 0) - **Timestep [31:0]:** When these spikes occurred (execRun_ctr value) **Example Packet:** ``` Tag: 0xEEEE Spike count: 3 Spike 0: neuron_id=42, valid=1, sub_ts=0 Spike 1: neuron_id=1000, valid=1, sub_ts=0 Spike 2: neuron_id=5123, valid=1, sub_ts=0 Spikes 3-13: valid=0 (empty) Timestep: 1500 ``` **Encoded as:** ``` [511:496] = 0xEEEE [495:480] = 3 (spike count) [479:448] = 0x00800150 # Spike 0: neuron 42 (0x2A) [447:416] = 0x00803E80 # Spike 1: neuron 1000 (0x3E8) [415:384] = 0x00814046 # Spike 2: neuron 5123 (0x1403) [383:32] = 0 (empty slots) [31:0] = 1500 (timestep) ``` **Python Parsing:** ```python def parse_spike_packet(packet_512bit): tag = (packet_512bit >> 496) & 0xFFFF if tag != 0xEEEE: return None # Not a spike packet spike_count = (packet_512bit >> 480) & 0xFFFF timestep = packet_512bit & 0xFFFFFFFF spikes = [] for i in range(14): spike_word = (packet_512bit >> (32 + i*32)) & 0xFFFFFFFF valid = (spike_word >> 23) & 0x1 if valid: neuron_id = (spike_word >> 6) & 0x1FFFF sub_ts = spike_word & 0x3F spikes.append({'neuron_id': neuron_id, 'timestep': timestep, 'sub_ts': sub_ts}) return spikes ``` --- ## HBM Memory Structures HBM stores the network structure (pointers and synapses). All addresses are byte addresses. ### **Memory Map** ``` ┌──────────────────────────────────────────────────────────┐ │ HBM Memory Layout (8 GB total, 2 GB used) │ ├────────────────┬─────────────────────────────────────────┤ │ 0x00000000 │ Region 1: Axon Pointers │ │ - 0x00003FFF │ Size: 16 KB (16,384 bytes) │ │ │ Format: 32-bit pointers × 512 axons │ ├────────────────┼─────────────────────────────────────────┤ │ 0x00004000 │ Region 2: Neuron Pointers │ │ - 0x00007FFF │ Size: 512 KB │ │ │ Format: 32-bit pointers × 131,072 neurons│ ├────────────────┼─────────────────────────────────────────┤ │ 0x00008000 │ Region 3: Synapses │ │ - 0x7FFFFFFF │ Size: ~2 GB (variable, network-dependent)│ │ │ Format: Variable-length synapse lists │ └────────────────┴─────────────────────────────────────────┘ ``` --- ### **Pointer Format (32 bits)** Pointers are stored in Regions 1 and 2, mapping axon/neuron IDs to their synapse lists. ``` ┌───────────────────────────────────────────────────────────┐ │ 32-bit Pointer Entry │ ├────────────────┬──────────────────────────────────────────┤ │ [31:23] │ Length (9 bits) - number of synapse rows│ │ [22:0] │ Start Address (23 bits) - HBM row index │ │ │ (actual byte address = 0x8000 + addr×32)│ └────────────────┴──────────────────────────────────────────┘ ``` **Example:** ``` Axon 5 pointer = 0x00201234 Length = 0x001 (1 row = 8 synapses) Start address = 0x01234 (row index) Actual HBM address = 0x8000 + (0x1234 × 32) = 0x2A680 ``` **Python Encoding:** ```python def encode_pointer(start_row, num_rows): """ start_row: Row index in synapse region (not byte address) num_rows: Number of consecutive rows (each row = 8 synapses) """ length = num_rows & 0x1FF # 9 bits address = start_row & 0x7FFFFF # 23 bits pointer = (length << 23) | address return pointer def decode_pointer(pointer): length = (pointer >> 23) & 0x1FF start_row = pointer & 0x7FFFFF byte_address = 0x8000 + (start_row * 32) return {'num_rows': length, 'start_row': start_row, 'byte_address': byte_address} ``` --- ### **Synapse Format (32 bits)** Synapses are stored in Region 3, organized as rows of 8 synapses each (256 bits = 32 bytes per row). ``` ┌───────────────────────────────────────────────────────────┐ │ 32-bit Synapse Entry │ ├────────────────┬──────────────────────────────────────────┤ │ [31:29] │ OpCode (3 bits) │ │ │ 000 = Regular synapse │ │ │ 100 = Output spike (send to host) │ │ │ 101 = Recurrent connection │ │ [28:16] │ Target Address (13 bits) │ │ │ For synapse: target neuron ID │ │ │ For output: neuron to monitor │ │ [15:0] │ Weight (16 bits, signed fixed-point) │ │ │ Interpretation: weight / 32768 │ └────────────────┴──────────────────────────────────────────┘ ``` **OpCode Details:** | OpCode | Binary | Meaning | Target Field | Weight Field | |--------|--------|---------|--------------|--------------| | 0 | `3'b000` | Regular synapse | Neuron ID (13 bits, 0-8191 within group) | Synaptic weight (signed 16-bit) | | 4 | `3'b100` | Output spike | Neuron ID to report | Unused (set to 0) | | 5 | `3'b101` | Recurrent | Global neuron ID (13 bits) | Synaptic weight | **Weight Encoding:** Weights are 16-bit signed integers representing fixed-point values: - **Range:** -32,768 to +32,767 - **Interpretation:** `weight_value / 32768.0` - **Examples:** - `0x7FFF` (32767) → +0.9999... ≈ +1.0 - `0x4000` (16384) → +0.5 - `0x0400` (1024) → +0.03125 - `0x0000` (0) → 0.0 - `0xFC00` (-1024) → -0.03125 - `0x8000` (-32768) → -1.0 **Example Synapses:** ```python # Regular synapse: target neuron 42, weight +1000 (≈0.0305) synapse_1 = (0b000 << 29) | (42 << 16) | 1000 # = 0x002A03E8 # Output spike: report neuron 100 synapse_2 = (0b100 << 29) | (100 << 16) | 0 # = 0x80640000 # Negative weight synapse: target neuron 10, weight -500 (inhibitory) synapse_3 = (0b000 << 29) | (10 << 16) | ((-500) & 0xFFFF) # = 0x000AFE0C ``` **Python Encoding:** ```python def encode_synapse(opcode, target, weight): """ opcode: 0=regular, 4=output, 5=recurrent target: neuron ID (0-8191 for regular, 0-131071 for global) weight: signed integer (-32768 to 32767) """ opcode_bits = (opcode & 0x7) << 29 target_bits = (target & 0x1FFF) << 16 weight_bits = weight & 0xFFFF synapse = opcode_bits | target_bits | weight_bits return synapse def decode_synapse(synapse): opcode = (synapse >> 29) & 0x7 target = (synapse >> 16) & 0x1FFF weight = synapse & 0xFFFF # Sign extend weight if necessary if weight & 0x8000: # Negative weight = weight - 65536 return {'opcode': opcode, 'target': target, 'weight': weight} ``` **Synapse Row (256 bits = 8 synapses):** ``` Row at HBM address 0x8000: [255:224] = Synapse 7 [223:192] = Synapse 6 [191:160] = Synapse 5 [159:128] = Synapse 4 [127:96] = Synapse 3 [95:64] = Synapse 2 [63:32] = Synapse 1 [31:0] = Synapse 0 ``` --- ## BRAM Memory Structures BRAM stores spike masks for external events (axon spikes). ### **BRAM Organization** ``` ┌──────────────────────────────────────────────────────────┐ │ BRAM: 32,768 rows × 256 bits per row = 1 MB │ ├────────────────┬─────────────────────────────────────────┤ │ Address │ Content │ ├────────────────┼─────────────────────────────────────────┤ │ 0x0000 │ Axon/Event 0 spike mask │ │ 0x0001 │ Axon/Event 1 spike mask │ │ ... │ ... │ │ 0x7FFF │ Axon/Event 32,767 spike mask │ └────────────────┴─────────────────────────────────────────┘ ``` --- ### **Spike Mask Format (256 bits)** Each row contains a 256-bit bitmask indicating which neuron groups should receive this spike. ``` ┌───────────────────────────────────────────────────────────┐ │ 256-bit Spike Mask (one BRAM row) │ ├────────────────┬──────────────────────────────────────────┤ │ [255:240] │ Group 15 mask (16 bits) │ │ [239:224] │ Group 14 mask (16 bits) │ │ ... │ ... │ │ [31:16] │ Group 1 mask (16 bits) │ │ [15:0] │ Group 0 mask (16 bits) │ └────────────────┴──────────────────────────────────────────┘ ``` **Each 16-bit group mask:** - Bit 0: First neuron in group should receive spike - Bit 1: Second neuron in group should receive spike - ... - Bit 15: 16th neuron in group should receive spike **Note:** This is a coarse-grained mask. For fine-grained connectivity, the spike is processed further: 1. BRAM mask identifies which groups get the spike 2. For each group, HBM is read to get the full synapse list 3. Synapse list specifies exact target neurons and weights **Example:** ``` Axon 5 fires, BRAM row 5 contains: Group 0 mask: 0x000F (neurons 0-3 in group 0) Group 1 mask: 0x0000 (no neurons in group 1) Group 2 mask: 0x8000 (neuron 15 in group 2) Groups 3-15: 0x0000 This means axon 5 spike should be delivered to: - Neurons 0, 1, 2, 3 in group 0 - Neuron 15 in group 2 ``` **Python Encoding:** ```python def encode_bram_mask(group_masks): """ group_masks: list of 16 integers (16-bit masks for each group) Returns: 256-bit value """ mask = 0 for i, group_mask in enumerate(group_masks): mask |= (group_mask & 0xFFFF) << (i * 16) return mask def decode_bram_mask(mask_256bit): """ mask_256bit: 256-bit value Returns: list of 16 group masks """ group_masks = [] for i in range(16): group_mask = (mask_256bit >> (i * 16)) & 0xFFFF group_masks.append(group_mask) return group_masks ``` --- ## PCIe Layer All communication between host and FPGA travels over PCIe using Transaction Layer Packets (TLPs). ### **PCIe TLP Format** hs_bridge and the FPGA do NOT directly create PCIe TLPs - the PCIe hardware handles this automatically. However, understanding the format is useful for debugging. **Memory Write TLP (Host → FPGA MMIO):** ``` ┌─────────────────────────────────────────────────────────────┐ │ PCIe Memory Write TLP │ ├────────────────────┬────────────────────────────────────────┤ │ Header (3-4 DWords)│ │ │ [127:125] │ Fmt = 010 (write with data, 32-bit addr)│ │ [124:120] │ Type = 00000 (memory write) │ │ [95:64] │ Address (32 bits) - FPGA MMIO address │ │ [9:0] │ Length (10 bits) - DWords to transfer │ ├────────────────────┼────────────────────────────────────────┤ │ Data (N DWords) │ Payload data (up to 4096 bytes) │ └────────────────────┴────────────────────────────────────────┘ ``` **Memory Read TLP (FPGA → Host Memory via DMA):** ``` ┌─────────────────────────────────────────────────────────────┐ │ PCIe Memory Read TLP │ ├────────────────────┬────────────────────────────────────────┤ │ Header (4 DWords) │ │ │ [127:125] │ Fmt = 001 (read request, 64-bit addr) │ │ [124:120] │ Type = 00000 (memory read) │ │ [95:0] │ Address (64 bits) - host DDR4 address │ │ [9:0] │ Length (10 bits) - DWords requested │ └────────────────────┴────────────────────────────────────────┘ ``` **Completion TLP (Host → FPGA, returning DMA data):** ``` ┌─────────────────────────────────────────────────────────────┐ │ PCIe Completion TLP │ ├────────────────────┬────────────────────────────────────────┤ │ Header (3 DWords) │ │ │ [127:125] │ Fmt = 010 (completion with data) │ │ [124:120] │ Type = 01010 (completion) │ │ [9:0] │ Byte count (10 bits) │ ├────────────────────┼────────────────────────────────────────┤ │ Data (N DWords) │ Requested data from host memory │ └────────────────────┴────────────────────────────────────────┘ ``` **Key Points:** - **DWord:** 32-bit (4-byte) word - **Addressing:** Can be 32-bit or 64-bit depending on format - **Maximum payload:** 4096 bytes (4 KB) per TLP - **Ordering:** Memory writes are posted (no response), reads require completions **hs_bridge's Role:** - hs_bridge does NOT create TLPs directly - When hs_bridge writes to an MMIO address, the OS kernel driver and PCIe hardware create the TLP - When FPGA does DMA, the FPGA's PCIe hard block creates Memory Read TLPs automatically --- ## Summary: Packet Flow ### **Host to FPGA Flow:** ``` 1. Python (hs_bridge): packet = create_512bit_command(opcode=0x01, ...) 2. Write to system memory (DDR4): dma_buffer[0] = packet 3. Tell FPGA via MMIO (creates PCIe Memory Write TLP): fpga.write_register(DMA_ADDR_REG, physical_address) 4. FPGA reads via DMA (creates PCIe Memory Read TLP): FPGA → PCIe: "Send me data from address X" 5. Host responds (PCIe Completion TLP): Host → FPGA: "Here's the 512-bit packet" 6. FPGA decodes: Extracts opcode, routes to appropriate module ``` ### **FPGA to Host Flow:** ``` 1. Neuron spikes: URAM threshold check → spike detected 2. Spike collection: Spike FIFO gathers spikes from all neuron groups 3. Packet assembly: spike_fifo_controller creates 512-bit spike packet 4. Write to output FIFO: Buffered in FPGA FIFO 5. DMA to host (creates PCIe Memory Write TLP): FPGA → Host memory: Write spike packet to DMA buffer 6. Host retrieves: fpga_controller.flush_spikes() reads from DMA buffer ``` --- ## Quick Reference Tables ### **Command Opcodes** | Code | Name | Payload | |------|------|---------| | 0x00 | INPUT_SPIKES | `[495:480]=axon_id` | | 0x01 | EXECUTE | `[495:480]=num_timesteps` | | 0x02 | HBM_WRITE | `[495:464]=addr, [463:432]=len, [431:176]=data` | | 0x04 | URAM_WRITE | `[495:480]=neuron_id, [479:444]=voltage` | | 0x06 | CONFIG_WRITE | `[495:480]=reg_addr, [479:416]=value` | ### **Synapse OpCodes** | Code | Binary | Meaning | |------|--------|---------| | 0 | `000` | Regular synapse | | 4 | `100` | Output spike (send to host) | | 5 | `101` | Recurrent connection | ### **Memory Regions** | Region | Base Address | Size | Contents | |--------|--------------|------|----------| | Axon Ptrs | 0x00000000 | 16 KB | Axon → synapse pointers | | Neuron Ptrs | 0x00004000 | 512 KB | Neuron → synapse pointers | | Synapses | 0x00008000 | ~2 GB | Synapse lists | --- This reference should provide all the information needed to encode/decode packets and data structures used throughout the hs_bridge and FPGA implementation.