hbm_processor.v#

Module Overview#

Purpose and Role in Stack#

The hbm_processor is the HBM (High Bandwidth Memory) controller and synapse data manager, responsible for fetching synaptic connectivity data from off-chip HBM. This module:

  • Implements AXI4 master interface to HBM (256-bit data width)

  • Orchestrates two-phase data retrieval:

    • Phase 1: Fetch pointer data for external inputs (BRAM) and internal neurons (URAM)

    • Phase 2: Follow pointer chains to fetch actual synapse data

  • Manages pointer FIFO for synapse chain traversal

  • Provides CI access for reading/writing HBM during configuration

  • Coordinates with spike generation by writing spike addresses directly to spike FIFOs

  • Combines 256-bit HBM reads into 512-bit packets for 16 neuron groups

In the software/hardware stack:

Command Interpreter ──► HBM read/write requests
                         │
External Events Proc ──► Triggers Phase 1 execution
Internal Events Proc ──► Receives pointer/synapse data
                         │
                         ▼
                   hbm_processor
                   (AXI4 Master)
                         │
                         ▼
                    HBM Memory
                  (Synapse Storage)
                         │
                         ▼
           Pointer FIFO ◄─ Pointer chains
                         │
                         ▼
              Spike FIFOs ◄─ Spike addresses

This module is critical for network connectivity, translating sparse synaptic connections into efficient memory accesses.


Module Architecture#

High-Level Block Diagram#

                    hbm_processor
    ┌─────────────────────────────────────────────────────────┐
    │                                                         │
    │  ┌───────────────────────────────────────────────┐     │
    │  │   TX (Transmit) State Machine                 │     │
    │  │   - Sends read/write commands to HBM          │     │
    │  │   - Manages address generation                │     │
    │  └────────────┬──────────────────────────────────┘     │
    │               │                                         │
    │  ┌────────────▼────────────────────────────────────┐   │
    │  │   Address Multiplexer                           │   │
    │  │   Phase 0: {0, tx_select, tx_addr, 4'b0}       │   │
    │  │   Phase 1: {ptr_addr, 5'b0}                    │   │
    │  │   CI mode: {ci2hbm_dout[278:256], 5'b0}        │   │
    │  └────────────┬────────────────────────────────────┘   │
    │               │                                         │
    │  ┌────────────▼────────────────────────────────────┐   │
HBM │  │   AXI4 Master Interface                         │   │
AXI4│◄─┤   - araddr, arvalid, arready (Read Address)    │   │
    │  │   - rdata, rvalid, rready (Read Data)          │   │
    │  │   - awaddr, awvalid, awready (Write Address)   │   │
    │  │   - wdata, wvalid, wready (Write Data)         │   │
    │  │   - bvalid, bready (Write Response)            │   │
    │  │   - Burst mode: INCR, size=256-bit             │   │
    │  └────────────┬────────────────────────────────────┘   │
    │               │                                         │
    │  ┌────────────▼────────────────────────────────────┐   │
    │  │   RX (Receive) State Machine                    │   │
    │  │   - Collects HBM read responses                 │   │
    │  │   - Routes data to appropriate destination      │   │
    │  └─┬──────┬──────────┬────────┬──────┬────────────┘   │
    │    │      │          │        │      │                │
    │    │      │          │        │      │                │
Pointer   │      │          │        │      │ Spikes (Phase 1)
FIFO      │      │          │        │      └──► spk0-7_wren
    ◄─────┘      │          │        │
                 │          │        │
Command          │          │        └──► hbm2ci (CI responses)
Interpreter      │          │
                 │          │
                 │          └──► exec_hbm_rdata (512-bit)
Internal/        │                  [511:0] = {upper256, lower256}
External         │                  Phase 1: Pointer data
Events Procs     │                  Phase 2: Synapse data
                 │
                 ▼
          ┌────────────────────────────────┐
          │   256→512 bit Converter         │
          │   - hbm_count toggles           │
          │   - Combines 2 × 256-bit reads  │
          │   - Outputs on 2nd read         │
          └────────────────────────────────┘
    │                                                         │
    │  ┌───────────────────────────────────────────────┐     │
    │  │   Pointer Chain Management                    │     │
    │  │   - ptrFIFO_dout[31:23]: Length (9 bits)      │     │
    │  │   - ptrFIFO_dout[22:0]:  Address (23 bits)    │     │
    │  │   - ptr_burst: Dynamic burst calculation      │     │
    │  │   - ptr_ctr: Tracks progress through chain   │     │
    │  └───────────────────────────────────────────────┘     │
    │                                                         │
    └─────────────────────────────────────────────────────────┘

Interface Specification#

Module Parameters#

Parameter

Default

Description

HBM_ADDR_BITS

33

HBM address width (8 GB addressable)

HBM_DATA_WIDTH

256

HBM data bus width

HBM_BYTE_COUNT

32

Bytes per transaction (256/8)

Clock and Reset#

Signal

Direction

Width

Description

clk

Input

1

225 MHz system clock

resetn

Input

1

Active-low synchronous reset

Network Configuration#

Signal

Direction

Width

Description

num_inputs

Input

17

Number of input axons (external events)

num_outputs

Input

17

Number of output neurons (internal events)

core_number

Input

5

Core identifier (0-31) for multi-core systems

Execution Control#

Signal

Direction

Width

Description

exec_run

Input

1

Start new timestep execution

exec_bram_phase1_ready

Input

1

External events processor pipeline filled

exec_uram_phase1_ready

Input

1

Internal events processor pipeline filled

exec_hbm_rvalidready

Output (wire)

1

HBM data valid for IEP/EEP (every 2nd read)

exec_hbm_tx_phase1_done

Output (wire)

1

TX completed Phase 1 command sending

exec_hbm_tx_phase2_done

Output (wire)

1

TX completed Phase 2 command sending

exec_hbm_rx_phase1_done

Output (wire)

1

RX completed Phase 1 data collection

exec_hbm_rx_phase2_done

Output (wire)

1

RX completed Phase 2 data collection

HBM Data Output#

Signal

Direction

Width

Description

exec_hbm_rdata

Output (wire)

512

Combined HBM data for 16 neuron groups

hbmFIFO_full

Input

1

Backpressure from downstream FIFO

Data Format (exec_hbm_rdata[511:0]):

[511:256] = Upper 256-bit read (most recent)
[255:0]   = Lower 256-bit read (latched from previous cycle)

Each 256-bit word contains data for 8 neuron groups (32 bits each)
Two 256-bit reads → 16 neuron groups → 512-bit output

Pointer FIFO Interface#

Signal

Direction

Width

Description

ptrFIFO_empty

Input

1

Pointer FIFO empty flag

ptrFIFO_dout

Input

32

Pointer FIFO data output

ptrFIFO_rden

Output (reg)

1

Pointer FIFO read enable

Pointer Format (ptrFIFO_dout[31:0]):

[31:23] = Chain length (9 bits, max 511 synapses)
[22:0]  = HBM address (23 bits, byte address >> 5)

Command Interpreter Interface#

Input (CI to HBM):

Signal

Direction

Width

Description

ci2hbm_empty

Input

1

Command FIFO empty flag

ci2hbm_dout

Input

280

Command data

ci2hbm_rden

Output (reg)

1

Command FIFO read enable

Command Format (ci2hbm_dout[279:0]):

[279]       = R/W (0=read, 1=write)
[278:256]   = HBM address (23 bits)
[255:0]     = Write data (256 bits)

Output (HBM to CI):

Signal

Direction

Width

Description

hbm2ci_full

Input

1

Response FIFO full flag

hbm2ci_din

Output (wire)

256

Response data (= hbm_rdata)

hbm2ci_wren

Output (reg)

1

Response FIFO write enable

Spike FIFO Interface (8 FIFOs)#

Per FIFO (0-7):

Signal

Direction

Width

Description

spkN_full

Input

1

Spike FIFO full flag

spkN_din

Output (wire)

17

Spike neuron address

spkN_wren

Output (wire)

1

Spike FIFO write enable

Spike Data Extraction:

spk0_din = hbm_rdata[016:000];  // 17-bit neuron address
spk0_wren = !spk0_full & exec_hbm_rx_phase1_done &
            exec_hbm_rvalidready_2x & hbm_rdata[031];  // Spike flag

// Similar for spk1-7 from hbm_rdata[048:032] through [240:224]

Note: Spike data embedded in HBM pointer reads during Phase 1.

HBM AXI4 Master Interface#

Read Address Channel:

Signal

Direction

Width

Description

hbm_araddr

Output (reg)

33

Read address

hbm_arburst

Output (wire)

2

Burst type (2’b01 = INCR)

hbm_arid

Output (wire)

6

Transaction ID (always 6’d0)

hbm_arlen

Output (reg)

4

Burst length (beats - 1)

hbm_arready

Input

1

Address channel ready

hbm_arsize

Output (wire)

3

Beat size (3’d5 = 32 bytes = 256 bits)

hbm_arvalid

Output (reg)

1

Address valid

Read Data Channel:

Signal

Direction

Width

Description

hbm_rdata

Input

256

Read data

hbm_rid

Input

6

Transaction ID

hbm_rlast

Input

1

Last beat of burst

hbm_rready

Output (reg)

1

Data channel ready

hbm_rresp

Input

2

Read response (ignored)

hbm_rvalid

Input

1

Read data valid

Write Address Channel:

Signal

Direction

Width

Description

hbm_awaddr

Output (wire)

33

Write address (from ci2hbm)

hbm_awburst

Output (wire)

2

Burst type (2’b01 = INCR)

hbm_awid

Output (wire)

6

Transaction ID (always 6’d0)

hbm_awlen

Output (wire)

4

Burst length (always 4’d0 = 1 beat)

hbm_awready

Input

1

Address channel ready

hbm_awsize

Output (wire)

3

Beat size (3’d5 = 256 bits)

hbm_awvalid

Output (reg)

1

Address valid

Write Data Channel:

Signal

Direction

Width

Description

hbm_wdata

Output (wire)

256

Write data (from ci2hbm)

hbm_wlast

Output (wire)

1

Last beat (always 1 for single-beat)

hbm_wready

Input

1

Data channel ready

hbm_wstrb

Output (wire)

32

Write strobes (all 1’s)

hbm_wvalid

Output (reg)

1

Write data valid

Write Response Channel:

Signal

Direction

Width

Description

hbm_bid

Input

6

Transaction ID

hbm_bready

Output (reg)

1

Response channel ready

hbm_bresp

Input

2

Write response (ignored)

hbm_bvalid

Input

1

Response valid

Debug Interface#

Signal

Direction

Width

Description

hbm_curr_state

Output (wire)

4

TX state machine state (for VIO)


Detailed Logic Description#

TX (Transmit) State Machine#

States:

TX_STATE_RESET                          (4'd0)
TX_STATE_IDLE                           (4'd1)
TX_STATE_SEND_INPUT_READ_COMMANDS       (4'd2)  // Phase 1a
TX_STATE_SEND_OUTPUT_READ_COMMANDS      (4'd3)  // Phase 1b
TX_STATE_PHASE1_DONE                    (4'd4)
TX_STATE_POP_POINTER_FIFO               (4'd5)  // Phase 2 prep
TX_STATE_SEND_POINTER_READ_COMMANDS     (4'd6)  // Phase 2
TX_STATE_PHASE2_DONE                    (4'd7)
TX_STATE_READ_HBM_ADDR                  (4'd8)  // CI read
TX_STATE_WRITE_HBM_ADDR                 (4'd9)  // CI write address
TX_STATE_WRITE_HBM_DATA                 (4'd10) // CI write data
TX_STATE_WRITE_HBM_RESP                 (4'd11) // CI write response

State Transition Diagram:

        ┌──────────────┐
        │ TX_RESET     │
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐
    ┌──▶│ TX_IDLE      │◄─────────────────────────────┐
    │   └──┬───────┬───┘                              │
    │      │       │                                  │
    │ exec │       │ !ci2hbm_empty                    │
    │ _run │       ├─ R/W=0 ──> READ_HBM_ADDR ────────┤
    │      │       │                                  │
    │      │       └─ R/W=1 ──> WRITE_HBM_ADDR ──>   │
    │      │                      WRITE_HBM_DATA ──>  │
    │      │                      WRITE_HBM_RESP ─────┘
    │      │
    │      ▼
    │ SEND_INPUT_READ_COMMANDS
    │  (Phase 1a: External inputs)
    │  tx_addr: 0 → INPUT_ADDR_LIMIT
    │      │
    │      ▼
    │ SEND_OUTPUT_READ_COMMANDS
    │  (Phase 1b: Internal neurons)
    │  tx_addr: 0 → OUTPUT_ADDR_LIMIT
    │      │
    │      ▼
    │ PHASE1_DONE
    │  (toggle tx_phase, tx_select)
    │      │
    │      ▼
    │ POP_POINTER_FIFO
    │  (wait for ptrFIFO data)
    │  (255-cycle timeout)
    │      │
    │      ├─ !empty ──> SEND_POINTER_READ_COMMANDS
    │      │              (follow pointer chain)
    │      │                      │
    │      │              ptr_ctr reaches ptr_len
    │      │                      │
    │      │              ◄───────┘ (loop for next pointer)
    │      │
    │      └─ timeout ──> PHASE2_DONE
    │                          │
    └──────────────────────────┘

Phase 1 Addressing:

// Phase 0 (tx_phase = 0):
hbm_araddr = {5'd0, {8'd0, tx_select, tx_addr, 4'd0}, 5'd0};

Breakdown:
  [32:28] = 5'd0 (upper padding)
  [27:5]  = {8'd0, tx_select, tx_addr, 4'd0}
            [22:15] = 8'd0 (reserved/bank select)
            [14]    = tx_select (0=inputs/BRAM, 1=outputs/URAM)
            [13:4]  = tx_addr (10 bits)
            [3:0]   = 4'd0 (8 pointers per row * 4 bytes = 32 bytes = 5 bits)
  [4:0]   = 5'd0 (byte offset within 32-byte row)

Phase 2 Addressing:

// Phase 1 (tx_phase = 1):
hbm_araddr = {5'd0, ptr_addr, 5'd0};

Breakdown:
  [32:28] = 5'd0
  [27:5]  = ptr_addr (23 bits from ptrFIFO_dout)
  [4:0]   = 5'd0

RX (Receive) State Machine#

States:

RX_STATE_RESET                (4'd0)
RX_STATE_IDLE                 (4'd1)
RX_STATE_WAIT_BRAM_PIPELINE   (4'd2)  // Wait for EEP ready
RX_STATE_READ_INPUT_POINTERS  (4'd3)  // Collect external pointer data
RX_STATE_WAIT_URAM_PIPELINE   (4'd4)  // Wait for IEP ready
RX_STATE_READ_OUTPUT_POINTERS (4'd5)  // Collect internal pointer data
RX_STATE_PHASE1_DONE          (4'd6)
RX_STATE_READ_SYNAPSE_DATA    (4'd7)  // Collect synapse data (Phase 2)
RX_STATE_PHASE2_DONE          (4'd8)
RX_STATE_READ_HBM_RESP        (4'd9)  // CI read response

State Transition Diagram:

        ┌──────────────┐
        │ RX_RESET     │
        └──────┬───────┘
               │
               ▼
        ┌──────────────┐
    ┌──▶│ RX_IDLE      │◄────────────────────────┐
    │   └──┬───────┬───┘                         │
    │      │       │                             │
    │ exec │       │ TX → READ_HBM_ADDR          │
    │ _run │       └──> READ_HBM_RESP ───────────┘
    │      │
    │      ▼
    │ WAIT_BRAM_PIPELINE
    │  (wait exec_bram_phase1_ready)
    │      │
    │      ▼
    │ READ_INPUT_POINTERS
    │  (collect HBM reads for inputs)
    │  rx_addr: 0 → {INPUT_ADDR_LIMIT, INPUT_ADDR_MOD}
    │      │
    │      ▼
    │ WAIT_URAM_PIPELINE
    │  (wait exec_uram_phase1_ready)
    │      │
    │      ▼
    │ READ_OUTPUT_POINTERS
    │  (collect HBM reads for outputs)
    │  rx_addr: 0 → {OUTPUT_ADDR_LIMIT, OUTPUT_ADDR_MOD}
    │      │
    │      ▼
    │ PHASE1_DONE
    │      │
    │      ▼
    │ READ_SYNAPSE_DATA
    │  (collect Phase 2 reads)
    │  wait: rx_ptr_ctr == tx_ptr_ctr
    │      │
    │      ▼
    │ PHASE2_DONE
    │      │
    └──────┘

Pointer Chain Management#

Pointer FIFO Data Structure:

ptrFIFO_dout[31:0]:
  [31:23] = Length (9 bits) → max 511 synapses in chain
  [22:0]  = Start address (23 bits) → HBM address >> 5

Burst Calculation:

// Determine burst length for AXI transaction
ptr_burst = (ptr_ctr[8:4] == ptr_len[8:4]) ?
            ptr_len[3:0] :  // Last burst (partial)
            4'hF;           // Full burst (16 beats)

// Example:
// ptr_len = 9'd35 (36 synapses)
// Burst 1: ptr_ctr=0,  burst=15 (16 synapses)
// Burst 2: ptr_ctr=16, burst=15 (16 synapses)
// Burst 3: ptr_ctr=32, burst=3  (4 synapses, total=36)

Address Increment:

// After each burst completes:
ptr_addr <= ptr_addr + ptr_burst + 1'b1;
ptr_ctr  <= ptr_ctr + ptr_burst + 1'b1;

// When ptr_ctr[8:4] == ptr_len[8:4], chain complete
// Pop next pointer from ptrFIFO

256 → 512 Bit Converter#

Purpose: HBM provides 256-bit data, but 16 neuron groups require 512 bits.

Logic:

reg hbm_count;  // Toggles between 0 and 1
reg [255:0] hbm_rdata_lower;

always @(posedge clk) begin
    if (hbm_rvalid && hbm_rready) begin
        hbm_count <= ~hbm_count;
        if (~hbm_count)
            hbm_rdata_lower <= hbm_rdata;  // Latch 1st read
        // else: 2nd read available on hbm_rdata
    end
end

// Output combines latched and current data
assign exec_hbm_rdata = {hbm_rdata, hbm_rdata_lower};

// Assert rvalidready only on 2nd read
assign exec_hbm_rvalidready = hbm_rvalid & hbm_rready & hbm_count & ~hbmFIFO_full;

// For spike writes (need both reads)
assign exec_hbm_rvalidready_2x = hbm_rvalid & hbm_rready;

Timeline:

Cycle: 0      1      2      3      4      5
       │      │      │      │      │      │
rvalid ▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
       │      │      │      │      │      │
rdata  DATA_L│      │DATA_U│      │DATA_L'│
       │      │      │      │      │      │
hbm_   0      │1      │0      │1      │0     │
count  │      │      │      │      │      │
       │      │      │      │      │      │
rdata  XXXX   │DATA_L│DATA_L│DATA_U│DATA_U│
_lower │      │      │      │      │      │
       │      │      │      │      │      │
exec_  XXXX   │XXXX  │{U,L} │{U,L} │{U',L'}│
hbm_   │      │      │      │      │      │
rdata  │      │      │      │      │      │
       │      │      │      │      │      │
rvalid ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁
ready  │      │      │      │      │      │

Wait Clock Counter (Phase 2 Timeout)#

Purpose: Ensure all pointers transmitted from ptrFIFO before ending Phase 2.

Implementation:

reg [7:0] wait_clks_cnt;
wire [7:0] wait_clks_limit = 8'd255;

always @(posedge clk) begin
   if ((tx_curr_state == TX_STATE_POP_POINTER_FIFO) &&
       rx_phase1_done && ptrFIFO_empty)
      wait_clks_cnt <= wait_clks_cnt + 1'b1;
   else
      wait_clks_cnt <= 8'd0;
end

// Transition to PHASE2_DONE when timeout reached
if (wait_clks_cnt == wait_clks_limit)
   tx_next_state <= TX_STATE_PHASE2_DONE;

Rationale:

  • Round-robin pointer FIFO controller may take up to 16 cycles to send last pointer

  • 255-cycle wait provides generous margin

  • Prevents premature phase completion


Memory Map#

HBM Address Space#

Total: 8 GB (33-bit address)

Layout:

┌─────────────────────────────────────────────────────────┐
│  Address Range        │  Purpose                        │
├───────────────────────┼─────────────────────────────────┤
│  [32:28] (upper 5)    │  Padding (always 0)             │
├───────────────────────┼─────────────────────────────────┤
│  [27:5]  (23 bits)    │  Row address                     │
│                       │  - Phase 0: Structured layout    │
│                       │  - Phase 1: Pointer chain addr   │
├───────────────────────┼─────────────────────────────────┤
│  [4:0]   (lower 5)    │  Byte offset (always 0 for      │
│                       │  32-byte aligned accesses)      │
└─────────────────────────────────────────────────────────┘

Phase 0 Address Structure ([27:5] = 23 bits):

[27:20] = Reserved / Bank select (8 bits, unused)
[19]    = Input/Output select (tx_select)
          0 = External inputs (BRAM)
          1 = Internal neurons (URAM)
[18:9]  = Address within input/output space (tx_addr, 10 bits)
[8:5]   = Padding (4 bits, for pointer granularity)

Example:
  Input row 100:   {8'd0, 1'b0, 10'd100, 4'd0} → Addr 0x006400
  Output row 500:  {8'd0, 1'b1, 10'd500, 4'd0} →0x087D00

Pointer Data Structure (256-bit HBM row):

Per pointer (32 bits × 8 pointers = 256 bits):
  [31:23] = Next pointer length (9 bits)
  [22:0]  = Next pointer address (23 bits)

Row contains 8 pointers, indexed by [8:5] of address

Synapse Data Structure (256-bit HBM row):

Depends on network configuration, but typically:

Per synapse (variable size, often 16-32 bits):
  - Weight (signed, 8-16 bits)
  - Target neuron ID (13-17 bits)
  - Delay (optional)
  - Other metadata

During Phase 2, pointer chains lead to synapse data rows.


Timing Diagrams#

Phase 1a: Input Pointer Reads#

Cycle:   0      1      2      3      4      5      ...
         │      │      │      │      │      │      │
TX State IDLE   │SEND_ │SEND_ │SEND_ │SEND_ │SEND_ │
         │      │INPUT │INPUT │INPUT │INPUT │INPUT │
         │      │      │      │      │      │      │
hbm_     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
arvalid  │      │      │      │      │      │      │
         │      │      │      │      │      │      │
hbm_     ▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
arready  │      │      │      │      │      │      │
         │      │      │      │      │      │      │
tx_addr  0      │1      │2      │3      │4      │5      │
         │      │      │      │      │      │      │
hbm_     0x0000 │0x0100 │0x0200 │0x0300 │0x0400 │0x0500 │
araddr   │      │      │      │      │      │      │
(low23)  │      │      │      │      │      │      │
         │      │      │      │      │      │      │
hbm_     15     │15     │15     │15     │15     │15     │
arlen    │(16   │(16   │(16   │(16   │(16   │(16   │
(burst-1)│beats)│beats)│beats)│beats)│beats)│beats)│

Notes:

  • Each araddr issues a burst of 16 beats (4’hF + 1)

  • tx_addr increments on each arready handshake

  • Continues until tx_addr == INPUT_ADDR_LIMIT

Phase 1b → Phase 2 Transition#

Cycle:   N      N+1    N+2    N+3    N+4    N+5
         │      │      │      │      │      │
TX State SEND   │PHASE1│POP   │POP   │SEND  │
         OUTPUT │_DONE │_PTR  │_PTR  │_PTR  │
         │      │      │      │      │      │
tx_phase 0      │0      │1      │1      │1     │
         │      │      │      │      │      │
ptrFIFO  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁
_empty   │      │      │      │      │      │
         │      │      │      │      │      │
ptrFIFO  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
_rden    │      │      │      │      │      │
         │      │      │      │      │      │
ptr_addr XXXX   │XXXX  │XXXX  │ADDR1 │ADDR1 │
         │      │      │      │(set) │      │
         │      │      │      │      │      │
hbm_     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│
arvalid  │      │      │      │      │      │

Notes:

  • PHASE1_DONE toggles tx_phase to 1

  • POP_PTR waits for ptrFIFO not empty

  • ptr_addr loaded from ptrFIFO_dout

  • SEND_PTR issues AXI read with ptr_burst length

Phase 2: Pointer Chain Traversal#

Cycle:   0      1      2      ...    16     17     18
         │      │      │      │      │      │      │
TX State SEND   │SEND  │SEND  │SEND  │POP   │SEND  │
         _PTR   │_PTR  │_PTR  │_PTR  │_PTR  │_PTR  │
         │      │      │      │      │      │      │
hbm_     ▔▔▔▔▔▔▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
arvalid  │      │      │      │      │      │      │
         │      │      │      │      │      │      │
ptr_ctr  0      │16     │32     │48     │48     │48    │
         │      │      │      │      │      │      │
ptr_len  35     │35     │35     │35     │35     │100   │
         │      │      │      │      │      │(new) │
         │      │      │      │      │      │      │
ptr_     15     │15     │3      │15     │15     │15    │
burst    │      │      │(final)│(new   │      │      │
         │      │      │       │chain) │      │      │
         │      │      │      │      │      │      │
ptrFIFO  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁
_rden    │      │      │      │      │      │      │

Notes:

  • First chain: 36 synapses (len=35) → 3 bursts

  • Burst 1: 16 beats, Burst 2: 16 beats, Burst 3: 4 beats

  • After completion, pop next pointer and start new chain

  • Continues until ptrFIFO empty for 255 cycles

RX: Data Collection with 256→512 Conversion#

Cycle:   0      1      2      3      4      5      6
         │      │      │      │      │      │      │
hbm_     ▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
rvalid   │      │      │      │      │      │      │
         │      │      │      │      │      │      │
hbm_     DATA0L│      │DATA0U│      │DATA1L│      │
rdata    │      │      │      │      │      │      │
         │      │      │      │      │      │      │
hbm_     0      │1      │0      │1      │0      │1     │
count    │      │      │      │      │      │      │
         │      │      │      │      │      │      │
rdata_   XXXX   │DATA0L│DATA0L│DATA0U│DATA0U│DATA1L│
lower    │      │      │      │      │      │      │
         │      │      │      │      │      │      │
exec_hbm XXXX   │XXXX  │{0U,0L│{0U,0L│{1L,0U│{1L,0U│
_rdata   │      │      │}     │}     │}     │}     │
         │      │      │      │      │      │      │
exec_hbm ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rvalid  │      │      │      │      │      │      │
ready    │      │      │(2nd) │      │(2nd) │      │

Notes:

  • Only every 2nd read triggers exec_hbm_rvalidready

  • Data latched on odd reads, combined on even reads

  • Downstream modules receive 512-bit packets


Cross-References#

Software Integration#

Python (hs_bridge):

  • compile_network.compile() → Generates HBM memory layout

  • network.load_weights() → Writes synapse data via CI to HBM

  • network.read_hbm(address) → Debug HBM contents

  • utils.create_pointer_chains() → Organizes synapses into linked lists


Key Terms and Definitions#

Term

Definition

HBM

High Bandwidth Memory - Off-chip DRAM with 400+ GB/s bandwidth

Pointer Chain

Linked-list structure storing variable-length synapse lists

Phase 0 / Phase 1

TX phases: 0=fetch pointers, 1=fetch synapses

tx_phase

Toggles between Phase 0 and Phase 1

tx_select

In Phase 0: 0=inputs (BRAM), 1=outputs (URAM)

ptr_addr

HBM address for synapse data (from ptrFIFO)

ptr_len

Number of synapses in chain (from ptrFIFO)

ptr_burst

AXI burst length for current read (max 16 beats)

hbm_count

Toggles 0/1 to combine two 256-bit reads into 512-bit output

exec_hbm_rvalidready

Data valid signal (asserts every 2nd HBM read)

exec_hbm_rvalidready_2x

Data valid at HBM rate (every read)

AXI4

ARM Advanced eXtensible Interface - High-performance protocol

Burst

Multi-beat AXI transaction (up to 16 beats)

INCR

Incrementing burst type (addresses increment by size)


Performance Characteristics#

Throughput#

HBM Bandwidth:

  • Interface: 256-bit @ 225 MHz = 57.6 Gb/s = 7.2 GB/s per channel

  • System Total: 32 channels (HBM2) × 7.2 GB/s = 230 GB/s theoretical

Pointer Fetch Rate (Phase 1):

  • Burst size: 16 beats × 256 bits = 4096 bits = 512 bytes

  • Pointers per burst: 512 bytes / 32 bytes = 16 pointers

  • Rate: 16 pointers per ~20 cycles (burst + overhead) = ~180M pointers/sec

Synapse Fetch Rate (Phase 2):

  • Variable: Depends on ptr_len (chain length)

  • Typical: 1-10 synapses per neuron

  • Rate: Limited by network connectivity, not HBM bandwidth

Latency#

Operation

Cycles

Time @ 225 MHz

AXI Address Handshake

1

4.4 ns

HBM Read Latency

~100-200

0.4-0.9 µs

16-beat Burst Transfer

16

71 ns

Total per burst

~120-220

0.5-1.0 µs

Phase 1 Duration:

  • Depends on num_inputs and num_outputs

  • Typical: 1000-10,000 cycles = 4-44 µs

Phase 2 Duration:

  • Depends on total synapses across all active neurons

  • Typical: 10,000-1,000,000 cycles = 44 µs - 4.4 ms


Common Issues and Debugging#

Problem: Stuck in POP_POINTER_FIFO State#

Symptoms: TX never reaches PHASE2_DONE

Debug Steps:

  1. Check ptrFIFO_empty - should eventually assert

  2. Check wait_clks_cnt - should increment when empty

  3. Verify pointer_fifo_controller is writing to ptrFIFO

Common Cause: Pointer FIFO controller not generating pointers (upstream issue)

Problem: exec_hbm_rvalidready Never Asserts#

Symptoms: IEP/EEP waiting indefinitely for HBM data

Debug Steps:

  1. Check hbm_rvalid - should pulse from HBM

  2. Check hbm_count - should toggle 0→1→0

  3. Check hbmFIFO_full - may be blocking output

  4. Verify RX state machine in correct state

Common Cause: HBM not responding, or FIFO backpressure

Problem: Spike FIFOs Not Receiving Data#

Symptoms: No spikes generated during Phase 1

Debug Steps:

  1. Check exec_hbm_rx_phase1_done - should assert during Phase 1 reads

  2. Check hbm_rdata[31, 63, 95, ...] - spike flags should be set

  3. Check spkN_full - may be blocking writes

  4. Verify pointer data contains spike information

Common Cause: HBM pointer data doesn’t include spike flags


Safety and Edge Cases#

Reset Behavior#

On resetn deassertion:

  • All state machines → RESET → IDLE

  • Phase flags → done (ready for exec_run)

  • Counters → 0

  • Address registers → 0

Burst Length Edge Cases#

Last Burst in Chain:

  • ptr_burst calculated as ptr_len[3:0] when on final segment

  • Ensures exact number of synapses read, no over-fetch

Empty Input/Output:

  • If num_inputs=0 or num_outputs=0, respective phase skipped

  • Address limit check immediately true

AXI Protocol Compliance#

Write Transactions:

  • Single-beat only (awlen=0, wlast=1)

  • No burst writes implemented

Read Transactions:

  • Supports bursts up to 16 beats

  • No support for wrap or fixed-address bursts (only INCR)


Future Enhancement Opportunities#

  1. Prefetching: Begin Phase 2 pointer fetches before Phase 1 completes

  2. Burst Optimization: Merge adjacent pointer chains into single burst

  3. Multi-Channel HBM: Distribute addresses across HBM channels for parallelism

  4. Error Detection: Monitor hbm_rresp and hbm_bresp for errors

  5. Performance Counters: Track HBM utilization, stall cycles

  6. Adaptive Timeout: Adjust wait_clks_limit based on ptrFIFO depth

  7. Write Bursts: Support multi-beat writes for faster HBM initialization


Document Version: 1.0 Last Updated: December 2025 Module File: hbm_processor.v Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/ Purpose: HBM memory controller and synapse data manager HBM Bandwidth: 400+ GB/s (theoretical) AXI4 Interface: 256-bit data width, 33-bit address Clock Frequency: 225 MHz