command_interpreter.v#

Module Overview#

Purpose and Role in Stack#

The command_interpreter module serves as the central command router and execution controller for the neuromorphic FPGA core. It acts as the primary interface between the host computer (via PCIe) and all internal processing modules. This module:

  • Decodes and routes PCIe commands to appropriate processors (HBM, external events, internal events)

  • Manages network execution flow including time-step sequencing and execution counters

  • Handles axon event distribution by parsing 512-bit PCIe packets into individual axon events

  • Collects and batches spike outputs for transmission back to the host

  • Maintains execution statistics including time-step counters and FPGA cycle timers

In the software/hardware stack:

Host (Python hs_bridge) → PCIe DMA → pcie2fifos → command_interpreter → Processing Modules
                                                   ↓
                                         [External Events Processor]
                                         [HBM Processor]
                                         [Internal Events Processor]

Module Architecture#

High-Level Block Diagram#

                    ┌────────────────────────────────────────────────┐
                    │         command_interpreter                    │
                    │                                                 │
    PCIe RX FIFO    │  ┌──────────────────────────────────────┐     │
    512-bit         │  │      RX State Machine                │     │
    ────────────────┼──►  Command Decoder & Router            │     │
                    │  │  - CMD_EEP_W: Load axon events       │     │
                    │  │  - CMD_HBM_RW: R/W synapses         │     │
                    │  │  - CMD_IEP_RW: R/W neurons          │     │
                    │  │  - CMD_NTWK_PARAM_W: Set params     │     │
                    │  │  - CMD_EXEC_STEP: Run 1 timestep    │     │
                    │  │  - CMD_EXEC_CONT: Continuous run    │     │
                    │  └────┬─────────────┬──────────┬────────┘     │
                    │       │             │          │               │
                    │   ┌───▼────┐   ┌───▼────┐  ┌─▼────────┐      │
                    │   │ Axon   │   │  HBM   │  │ Internal │      │
                    │   │ Event  │   │  FIFO  │  │  Events  │      │
                    │   │ Shifter│   │ ci2hbm │  │   FIFO   │      │
    External Events │   └────┬───┘   └───┬────┘  │ ci2iep   │      │
    Processor       │        │           │       └─┬────────┘      │
    ◄───────────────┼────────┘           │         │               │
                    │                    │         │               │
    HBM Processor   │                    │         │               │
    ◄───────────────┼────────────────────┘         │               │
                    │                              │               │
    Internal Events │                              │               │
    Processor       │                              │               │
    ◄───────────────┼──────────────────────────────┘               │
                    │                                               │
                    │  ┌──────────────────────────────────────┐    │
                    │  │      TX State Machine                │    │
    PCIe TX FIFO    │  │  Spike Collection & Batching         │    │
    512-bit         │  │  - Batches 14 spikes per packet      │◄───┼── Spike FIFO
    ◄───────────────┼──┤  - Includes timestamp (execRun_ctr)  │    │   (from spike_
                    │  │  - Opcode: 0xEEEE_EEEE               │    │    fifo_controller)
                    │  └──────────────────────────────────────┘    │
                    │                                               │
                    │  ┌──────────────────────────────────────┐    │
                    │  │   Execution Control Registers        │    │
                    │  │  - execRun_ctr (time-step counter)   │    │
                    │  │  - execRun_limit (max timesteps)     │    │
                    │  │  - execRun_timer (FPGA cycle count)  │    │
                    │  │  - execRun_running / execRun_done    │    │
                    │  └──────────────────────────────────────┘    │
                    └────────────────────────────────────────────────┘

Interface Specification#

Module Parameters#

Parameter

Width

Default

Description

AXI_ADDR_BITS

-

32

AXI address width (currently unused)

AXI_DATA_WIDTH

-

32

AXI data width (currently unused)

HBM_ADDR_BITS

-

33

HBM address width (8 GB addressable)

HBM_DATA_WIDTH

-

256

HBM data width

HBM_BYTE_COUNT

-

32

HBM bytes per transaction (256 bits / 8)

Clock and Reset#

Signal

Direction

Width

Description

aclk

Input

1

225 MHz system clock

aresetn

Input

1

Active-low asynchronous reset

PCIe Interface (via pcie2fifos)#

Signal

Direction

Width

Description

RX FIFO (Host → Card)

rxFIFO_empty

Input

1

RX FIFO empty flag

rxFIFO_dout

Input

512

RX FIFO data output

rxFIFO_rden

Output

1

RX FIFO read enable

TX FIFO (Card → Host)

txFIFO_full

Input

1

TX FIFO full flag

txFIFO_din

Output

512

TX FIFO data input

txFIFO_wren

Output

1

TX FIFO write enable

PCIe Data Format:

RX FIFO (512-bit packet from host):

[511:504] = 8-bit command opcode
[503:0]   = Command-specific data payload

TX FIFO (512-bit packet to host):

Spike packet:
  [511:480] = 0xEEEE_EEEE (spike opcode)
  [479:32]  = 14 × 32-bit spike events (448 bits)
              Each spike: [31:24]=sub-timestamp, [23]=flag, [22:16]=zeros, [15:0]=neuron addr
  [31:0]    = execRun_ctr (timestep counter)

HBM read response:
  [511:496] = 0xBBBB
  [495:256] = zeros (240 bits)
  [255:0]   = HBM data

Neuron read response:
  [511:496] = 0xCCCC
  [495:53]  = zeros (443 bits)
  [52:0]    = Neuron data (36-bit value + 17-bit address)

External Events Processor Interface#

Signal

Direction

Width

Description

axonEvent_set

Output

1

Axon event valid/write enable

axonEvent_addr

Output

13

Row address in BRAM (0 to 8191)

axonEvent_data

Output

16

Data mask (1 bit per neuron group)

Address Calculation:

  • 16 neuron groups → 16 axons per row

  • num_inputs[16:4] = number of rows (ignoring lower 4 bits)

  • Each 512-bit PCIe packet contains 512/16 = 32 axon events

HBM Processor Interface#

Signal

Direction

Width

Description

Command Interface (CI → HBM)

ci2hbm_full

Input

1

Command FIFO full flag

ci2hbm_din

Output

280

Command data

ci2hbm_wren

Output

1

Command write enable

Response Interface (HBM → CI)

hbm2ci_empty

Input

1

Response FIFO empty flag

hbm2ci_dout

Input

256

Response data

hbm2ci_rden

Output

1

Response read enable

Command Format (ci2hbm_din[279:0]):

[279]     = R/W command (0=read, 1=write)
[278:256] = 23-bit row address
[255:0]   = 256-bit data (for writes)

Internal Events Processor Interface#

Signal

Direction

Width

Description

Command Interface (CI → IEP)

ci2iep_full

Input

1

Command FIFO full flag

ci2iep_din

Output

54

Command data

ci2iep_wren

Output

1

Command write enable

Response Interface (IEP → CI)

iep2ci_empty

Input

1

Response FIFO empty flag

iep2ci_dout

Input

53

Response data

iep2ci_rden

Output

1

Response read enable

Command Format (ci2iep_din[53:0]):

[53]      = R/W command (0=read, 1=write)
[52:36]   = 17-bit neuron address (128K neurons addressable)
[35:0]    = 36-bit neuron data (membrane potential or other state)

Note: Comments in code show previous 16-bit data format; upgraded to 36-bit.

Spike Event Interface#

Signal

Direction

Width

Description

spk2ciFIFO_dout

Input

17

Spiked neuron address

spk2ciFIFO_empty

Input

1

Spike FIFO empty flag

spk2ciFIFO_rden

Output

1

Spike FIFO read enable

Network Execution Control#

Signal

Direction

Width

Description

exec_iep_phase2_done

Input

1

Internal events processor phase 2 completion flag

exec_run

Output

1

Execute time-step command (1-cycle pulse)

execRun_running

Output

1

Execution in progress flag

execRun_done

Output

1

Execution completed flag

execRun_limit

Output

32

Maximum time steps (0 = single step)

execRun_ctr

Output

32

Current time-step counter

execRun_timer

Output

64

FPGA clock cycle counter during execution

Network Parameters#

Signal

Direction

Width

Description

num_inputs

Output

17

Number of input axons (131,072 max)

num_outputs

Output

17

Number of output neurons to monitor

threshold

Output

36

Neuron spike threshold (signed)

exec_neuron_model

Output

2

Neuron model selection (0-3)

Debug Interface#

Signal

Direction

Width

Description

vio_rx_curr_state

Output

3

RX state machine state (for VIO debugging)

vio_tx_curr_state

Output

2

TX state machine state (for VIO debugging)


Detailed Logic Description#

Command Opcodes#

localparam [7:0] CMD_EEP_W        = 8'd1;  // Write axon events to BRAM
localparam [7:0] CMD_HBM_RW       = 8'd2;  // Read/write HBM synapse data
localparam [7:0] CMD_IEP_RW       = 8'd3;  // Read/write neuron state
localparam [7:0] CMD_NTWK_PARAM_W = 8'd4;  // Write network parameters
localparam [7:0] CMD_EXEC_STEP    = 8'd6;  // Execute single time step
localparam [7:0] CMD_EXEC_CONT    = 8'd7;  // Execute continuous (multiple timesteps)

Command extracted from: rx_command = rxFIFO_dout[511:504]

RX State Machine#

States:

RX_STATE_RESET                   (3'd0) - Reset state
RX_STATE_IDLE                    (3'd1) - Wait for commands
RX_STATE_REGISTER_PCIE_AXON_DATA (3'd2) - Register 512-bit axon packet
RX_STATE_SET_AXON_DATA           (3'd3) - Shift and distribute axon events
RX_STATE_EXEC_STEP               (3'd4) - Execute time step
RX_STATE_WAIT_RUN                (3'd5) - Wait for execution completion
RX_STATE_EXEC_DONE               (3'd6) - Execution finished

State Transition Diagram:

                   ┌─────────────┐
                   │ RX_RESET    │
                   └──────┬──────┘
                          │
                          v
      ┌───────────────────────────────────────────┐
      │          RX_IDLE                          │
      │   Wait for rxFIFO commands                │
      └──┬──┬──┬──┬──┬──────────────────────────┬─┘
         │  │  │  │  │                          │
    EEP_W│  │  │  │  │EXEC_STEP                │
      ───┼──┼──┼──┼──┼──                        │
         │  │  │  │  │  │                       │
         │  │  │  │  v  v                       │
         │  │  │  │  RX_EXEC_STEP               │
         │  │  │  │      │                      │
         │  │  │  │      v                      │
         │  │  │  │  RX_WAIT_RUN                │
         │  │  │  │   (wait_clks_cnt)           │
         │  │  │  │      │                      │
         │  │  │  │      ├─ ctr<limit ──────────┘
         │  │  │  │      │
         │  │  │  │      └─ ctr==limit
         │  │  │  │            │
         │  │  │  │            v
         │  │  │  │      RX_EXEC_DONE
         │  │  │  │            │
         │  │  │  └────────────┴────────────┐
         │  │  │                            │
         │  │  │HBM_RW / IEP_RW / NTWK_W    │
         │  │  └───(immediate)──────────────┤
         │  │                               │
         │  │EXEC_CONT                      │
         v  v                               v
    RX_REGISTER_PCIE_AXON_DATA         RX_IDLE
         │
         v
    RX_SET_AXON_DATA
      (shift & increment)
         │
         ├─ addr[4:0]==31 ─────> loop to REGISTER
         │
         └─ addr==limit ───┬─ !running ──> RX_IDLE
                           │
                           └─  running ──> RX_EXEC_STEP

Key Logic:

  1. CMD_EEP_W (Write Axon Events):

    • Resets axonEvent_addr to 0

    • Fetches 512-bit packet into axon_data_sr

    • Shifts out 16 bits at a time, incrementing address

    • After every 32 events (one packet), fetches next packet

    • Continues until axonEvent_addr == num_inputs[16:4]

  2. CMD_HBM_RW / CMD_IEP_RW:

    • Direct passthrough to respective FIFOs

    • Waits for FIFO not full before writing

    • Returns to IDLE immediately

  3. CMD_NTWK_PARAM_W:

    • Writes network parameters from PCIe packet:

      num_inputs[16:0]        = rxFIFO_dout[16:0]
      num_outputs[16:0]       = rxFIFO_dout[33:17]
      threshold[35:0]         = rxFIFO_dout[69:34]
      exec_neuron_model[1:0]  = rxFIFO_dout[71:70]
      
  4. CMD_EXEC_STEP / CMD_EXEC_CONT:

    • Resets execution counters and timer

    • CMD_EXEC_CONT sets execRun_limit from packet [31:0]

    • Loads axon events (if EXEC_CONT) then executes

    • Waits for exec_iep_phase2_done signal

    • Additional 31-cycle wait to ensure all spikes collected

    • Repeats until execRun_ctr == execRun_limit

TX State Machine#

States:

TX_STATE_RESET           (2'd0) - Reset state
TX_STATE_IDLE            (2'd1) - Check execution status & FIFOs
TX_STATE_WAIT_FOR_SPIKES (2'd2) - Collect spikes during execution
TX_STATE_SEND_SPIKES     (2'd3) - Send batched spikes to host

State Transition Diagram:

         ┌─────────────┐
         │  TX_RESET   │
         └──────┬──────┘
                │
                v
         ┌────────────────────────────────┐
         │  TX_IDLE                       │
         │  Check execution & FIFOs       │
         └──┬──────┬──────────────────────┘
            │      │
execRun     │      │ !execRun & !txFIFO_full
  ──────────┘      └───────┬──────────────
            │              │
            v              ├─ !hbm2ci_empty ──> send HBM data ──┐
   TX_WAIT_FOR_SPIKES      │                                    │
      Collect spikes       └─ !iep2ci_empty ──> send IEP data ─┤
            │                                                   │
            ├─ spike_ctr==14 ──────────────────────────────────┤
            │                                                   │
            ├─ !spk2ciFIFO_empty ─> spike_inc (read & shift)   │
            │                                                   │
            └─ execRun_done ────┬─ spikes_sent ────────────────┤
                                │                               │
                                └─ !spikes_sent                 │
                                        │                       │
                                        v                       │
                                TX_SEND_SPIKES                  │
                                  (batch of 14)                 │
                                        │                       │
                                        └───────────────────────┘
                                                                │
                                                                v
                                                            TX_IDLE

Spike Collection Logic:

  • Spikes stored in 448-bit shift register spike_sr

  • Counter spike_ctr tracks number of spikes (max 14)

  • Each spike formatted as 32 bits:

    [31:24] = execRun_ctr[7:0]  (sub-timestamp)
    [23]    = 1'b1              (valid flag)
    [22:16] = 7'd0              (padding)
    [15:0]  = spk2ciFIFO_dout   (17-bit neuron address truncated?)
    

    Note: Bit allocation seems inconsistent (17-bit addr in 16 bits + 1 flag?)

  • Batch sent when:

    • spike_ctr == 14, OR

    • execRun_done and spikes pending

  • Final packet format (512 bits):

    [511:480] = 32'hEEEE_EEEE (opcode)
    [479:32]  = spike_sr[447:0] (14 spikes)
    [31:0]    = execRun_ctr (timestamp)
    

Axon Event Shifter#

Purpose: Convert 512-bit PCIe packets into sequential 16-bit axon events.

Registers:

  • axon_data_sr[511:0] - Shift register holding PCIe packet

  • axonEvent_addr[12:0] - Current row address (0 to 8191)

Operation:

  1. Load: axon_data_set loads rxFIFO_dout into axon_data_sr

  2. Shift & Increment: axon_addr_inc triggers:

    axonEvent_addr <= axonEvent_addr + 1'b1;
    axon_data_sr   <= {16'd0, axon_data_sr[511:16]};  // Right shift 16 bits
    
  3. Output: axonEvent_data = axon_data_sr[15:0] (LSBs)

  4. Limit: axon_addr_limit = num_inputs[16:4]

    • Divide by 16 since each row handles 16 axons

  5. Reload: After 32 shifts (one packet exhausted), fetch next packet

Timeline Example:

Cycle 0: Load packet → axon_data_sr = [511:0]
Cycle 1: Shift #1    → axonEvent_addr=0, data=sr[15:0]
Cycle 2: Shift #2    → axonEvent_addr=1, data=sr[31:16] (original)
...
Cycle 32: Shift #32  → axonEvent_addr=31, data=sr[511:496] (original)
Cycle 33: Load next packet

Execution Control Registers#

Controlled Signals:

  1. execRun_limit[31:0]:

    • Set by exec_run_set from rxFIFO_dout[31:0]

    • Reset by exec_run_rst (when not simultaneously setting)

    • Value of 0 means single time step

  2. execRun_ctr[31:0]:

    • Reset by exec_run_rst

    • Incremented by exec_run_inc

    • Represents current time step

  3. execRun_timer[63:0]:

    • Reset by exec_run_rst

    • Increments every cycle while execRun_running==1

    • Provides FPGA cycle count for performance measurement

  4. execRun_running:

    • Set by exec_run_rst

    • Cleared by exec_run_done

  5. execRun_done:

    • Set by exec_run_done

    • Cleared by exec_run_rst

Control Flow:

exec_run_rst  ──┬──> execRun_ctr = 0
                ├──> execRun_timer = 0
                ├──> execRun_running = 1
                └──> execRun_done = 0

exec_run_set  ──> execRun_limit = rxFIFO_dout[31:0]

exec_run_inc  ──> execRun_ctr++

execRun_running ──> execRun_timer++ (every cycle)

exec_run_done ──┬──> execRun_running = 0
                └──> execRun_done = 1

Wait Clock Counter#

Purpose: Ensure all intermediate spikes have been transmitted before advancing to next time step.

Register: wait_clks_cnt[4:0] Limit: 31 cycles (5’d31)

Logic:

if ((rx_curr_state==RX_STATE_WAIT_RUN) &
    exec_iep_phase2_done &
    spk2ciFIFO_empty)
   wait_clks_cnt <= wait_clks_cnt + 1'b1;
else
   wait_clks_cnt <= 5'd0;

Rationale:

  • Spike FIFO controller uses round-robin across 8 FIFOs

  • Up to 8 cycles for a spike to propagate to spk2ciFIFO

  • 31-cycle wait provides safety margin (4× worst case)

  • Only starts counting when phase 2 done AND spike FIFO empty


Memory Map#

Network Parameter Registers#

These parameters are configured via CMD_NTWK_PARAM_W command:

Register

Bits

Address in PCIe Packet

Description

num_inputs

17

[16:0]

Number of input axons (0 to 131,071)

num_outputs

17

[33:17]

Number of output neurons to monitor

threshold

36 (signed)

[69:34]

Neuron spike threshold

exec_neuron_model

2

[71:70]

Neuron model selection

Neuron Model Encoding:

2'b00 = Model 0 (e.g., Leaky Integrate-and-Fire)
2'b01 = Model 1 (e.g., Izhikevich)
2'b10 = Model 2 (e.g., Hodgkin-Huxley approximation)
2'b11 = Model 3 (reserved/custom)

Note: Actual model semantics defined in internal_events_processor


Timing Diagrams#

CMD_EEP_W: Writing Axon Events#

Cycle:    0      1      2      3      4      5      6      7      8      9
          │      │      │      │      │      │      │      │      │      │
aclk    ──┘▔▔▔▔▔▔└┐    ┌▔┐    ┌▔┐    ┌▔┐    ┌▔┐    ┌▔┐    ┌▔┐    ┌▔┐    ┌▔
          │      │▔▔▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│
State     IDLE   │REG_D │SET_D │SET_D │SET_D │SET_D │SET_D │SET_D │SET_D │
          │      │      │      │      │      │      │      │      │      │
rxFIFO    ▔▔CMD▔▔▔X▔▔PKT1▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
_dout     │      │      │      │      │      │      │      │      │      │
          │      │      │      │      │      │      │      │      │      │
rxFIFO    ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_rden     │      │      │      │      │      │      │      │      │      │
          │      │      │      │      │      │      │      │      │      │
axon_data ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_set      │      │      │      │      │      │      │      │      │      │
          │      │      │      │      │      │      │      │      │      │
axon_addr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
_inc      │      │      │      │      │      │      │      │      │      │
          │      │      │      │      │      │      │      │      │      │
axonEvent ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
_set      │      │      │      │      │      │      │      │      │      │
          │      │      │      │      │      │      │      │      │      │
axonEvent 0x0000 │0x0000 │D[15:0│D[31:16│D[47:32│D[63:48│D[79:64│D[95:80│
_data     │      │      │]     │]     │]     │]     │]     │]     │
          │      │      │      │      │      │      │      │      │      │
axonEvent 0      │0      │0     │1     │2     │3     │4     │5     │6     │
_addr     │      │      │      │      │      │      │      │      │      │

Notes:

  • IDLE state: Command detected (CMD_EEP_W)

  • REG_D state: Register 512-bit PCIe packet

  • SET_D state: Shift out 16 bits per cycle, increment address

  • After 32 shifts, return to REG_D to fetch next packet

CMD_EXEC_STEP: Single Time-Step Execution#

Cycle:     0      1      2      3     ...    N     N+1    N+2   ...  N+32  N+33   N+34
           │      │      │      │      │      │      │      │      │      │      │
State      IDLE   │EXEC_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │IDLE  │
           │      │STEP  │RUN   │RUN   │RUN   │RUN   │RUN   │RUN   │RUN   │      │
           │      │      │      │      │      │      │      │      │      │      │
exec_run   ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
           │      │      │      │      │      │      │      │      │      │      │
execRun    ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁
_running   │      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │      │
exec_iep   ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁
_phase2    │      │      │      │      │      │      │      │      │      │      │
_done      │      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │      │
spk2ci     ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔
FIFO_empty │      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │      │
wait_clks  0      │0      │0      │0      │0      │0      │1      │2      │31    │0
_cnt       │      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │      │
execRun    0      │0      │0      │0      │0      │0      │0      │0      │0      │0
_timer     │      │      │      │      │      │N     │N+1   │N+2   │N+32  │N+32  │

Notes:

  • N cycles: Time for external + internal event processing

  • Phase 2 done: Internal events processor completes neuron updates

  • 31-cycle wait: Ensures all spikes collected before next time step

  • Single time step: execRun_limit=0, no increment of execRun_ctr

Spike Batching and Transmission#

Cycle:     0      1      2      3     ...    14     15     16     17     18
           │      │      │      │      │      │      │      │      │      │
TX State   WAIT   │WAIT  │WAIT  │WAIT  │WAIT  │SEND  │WAIT  │WAIT  │WAIT  │
           _SPIKES│_SPKS │_SPKS │_SPKS │_SPKS │_SPKS │_SPIKES│_SPKS │_SPKS │
           │      │      │      │      │      │      │      │      │      │
spk2ciFIFO ▔▔▔▔▔▔▔│▁▁▁▁▁▁│▔▔▔▔▔▔│▁▁▁▁▁▁│▁▁▁▁▁▁│▁▁▁▁▁▁│▁▁▁▁▁▁│▔▔▔▔▔▔│▁▁▁▁▁▁│
_empty     │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │
spk2ciFIFO ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
_rden      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │
spike_inc  ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
           │      │      │      │      │      │      │      │      │      │
spike_ctr  0      │1      │1      │2      │2      │14     │0      │1      │1
           │      │      │      │      │      │      │      │      │      │
spike_sr   XXXX   │SPK1  │SPK1  │SPK2  │SPK2  │SPK14 │0     │SPK1' │SPK1' │
           │      │      │      │      │      │      │      │      │      │
txFIFO     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_wren      │      │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │      │
txFIFO     XXXXXX │XXXXXX│XXXXXX│XXXXXX│XXXXXX│0xEEEE│XXXXXX│XXXXXX│XXXXXX│
_din       │      │      │      │      │      │_EEEE │      │      │      │
           │      │      │      │      │      │+14SPK│      │      │      │

Notes:

  • Spikes collected as they arrive (non-empty FIFO)

  • After 14 spikes: Transition to SEND_SPIKES

  • Packet sent with opcode 0xEEEE_EEEE

  • Counter and shift register reset after send

  • Process repeats for next batch


Cross-References#

Software Integration#

Python (hs_bridge) Functions:

  • fpga_controller.write_axon_events() → Sends CMD_EEP_W commands

  • fpga_controller.read_hbm() / write_hbm() → Sends CMD_HBM_RW commands

  • fpga_controller.read_neuron() / write_neuron() → Sends CMD_IEP_RW commands

  • fpga_controller.set_network_params() → Sends CMD_NTWK_PARAM_W command

  • fpga_controller.execute_step() → Sends CMD_EXEC_STEP command

  • fpga_controller.execute_continuous() → Sends CMD_EXEC_CONT command

  • fpga_controller.read_spikes() → Reads TX FIFO for spike packets (opcode 0xEEEE_EEEE)


Key Terms and Definitions#

Term

Definition

Command Opcode

8-bit identifier in PCIe packet [511:504] specifying operation type

Axon Event

External input spike represented as row address + 16-bit mask

Time Step

One iteration of network simulation (external events → internal updates)

execRun_ctr

Time-step counter, incremented after each iteration

execRun_timer

FPGA clock cycle counter for performance profiling

Spike Batching

Grouping 14 spikes into single 512-bit PCIe packet for efficiency

Sub-timestamp

Lower 8 bits of execRun_ctr, provides intra-timestep spike ordering

Round-Robin

Fair scheduling used in spike FIFO controller (external to this module)

Wait Clock Counter

31-cycle delay ensuring all spikes transmitted before next time step

Shift Register

axon_data_sr and spike_sr used for serial data conversion

Phase 2

Internal events processor phase updating neuron states

FWFT

First-Word Fall-Through FIFO mode (used in pcie2fifos)


Design Evolution and Commented Code#

Evidence of Scaling (8 → 16 Neuron Groups)#

Axon Event Data Width:

// OLD (8 groups):  output [7:0] axonEvent_data
// NEW (16 groups): output [15:0] axonEvent_data

// OLD shift: axon_data_sr <= {8'd0, axon_data_sr[511:8]};
// NEW shift: axon_data_sr <= {16'd0, axon_data_sr[511:16]};

Address Calculation:

// OLD: wire [13:0] axon_addr_limit = num_inputs[16:3];  // 8 axons per row
// NEW: wire [12:0] axon_addr_limit = num_inputs[16:4];  // 16 axons per row

Shift Detection:

// OLD: if (axonEvent_addr[5:0]==6'd63)  // 64 x 8-bit events per packet
// NEW: if (axonEvent_addr[4:0]==5'd31)  // 32 x 16-bit events per packet

Neuron Data Width Upgrade#

Internal Events Processor Interface:

// OLD (16-bit neuron data):
// output [1+17+15:0] ci2iep_din   // 34 bits total
// input  [17+15:0]   iep2ci_dout  // 33 bits total

// NEW (36-bit neuron data):
output [1+17+35:0] ci2iep_din   // 54 bits total
input  [17+35:0]   iep2ci_dout  // 53 bits total

This suggests upgrade to higher-precision membrane potentials or additional neuron state variables.


Performance Considerations#

Throughput#

Axon Event Loading:

  • 32 events per 512-bit PCIe packet

  • 1 event per FPGA cycle (225 MHz)

  • 32 cycles to exhaust packet + 1-2 cycles fetch overhead

  • Throughput: ~6.75M axon events/second per core

Spike Output:

  • 14 spikes per 512-bit PCIe packet

  • Batching amortizes packet overhead

  • Throughput: Depends on spike rate; typically 1-10% of neurons spike per time step

Time Step Execution:

  • Variable duration based on:

    • Number of active axons (external events phase)

    • HBM access latency for synapse fetching

    • Number of neurons to update (internal events phase)

    • 31-cycle safety margin for spike collection

  • Typical: 1000-10,000 FPGA cycles per time step

Latency#

Command Response:

  • HBM read: ~100-200 ns (HBM latency + processing)

  • Neuron read: ~50-100 ns (URAM access + FIFO transfer)

Execution Completion:

  • Minimum: ~5 µs (1000 cycles @ 225 MHz)

  • Typical: ~20-50 µs depending on network activity


Debugging and Verification#

VIO Signals#

output [2:0] vio_rx_curr_state,  // Monitor RX state machine
output [1:0] vio_tx_curr_state,  // Monitor TX state machine

State Encodings for Debugging:

RX States:

3'd0 = RESET
3'd1 = IDLE
3'd2 = REGISTER_PCIE_AXON_DATA
3'd3 = SET_AXON_DATA
3'd4 = EXEC_STEP
3'd5 = WAIT_RUN
3'd6 = EXEC_DONE

TX States:

2'd0 = RESET
2'd1 = IDLE
2'd2 = WAIT_FOR_SPIKES
2'd3 = SEND_SPIKES

Common Debugging Scenarios#

Problem: Network doesn’t execute

  • Check: execRun_running should assert after CMD_EXEC_* command

  • Check: exec_iep_phase2_done should eventually assert

  • Check: wait_clks_cnt should count to 31

Problem: Spikes not received by host

  • Check: spk2ciFIFO_empty - should toggle during execution

  • Check: spike_ctr - should increment when spikes detected

  • Check: txFIFO_wren - should pulse when batches sent

Problem: Axon events not loaded

  • Check: rxFIFO_dout[511:504] == CMD_EEP_W (8’d1)

  • Check: axonEvent_addr should increment from 0 to num_inputs[16:4]

  • Check: axonEvent_set should pulse for each event


Safety and Edge Cases#

Reset Behavior#

  • All counters and state machines reset to safe states

  • Asynchronous reset (~aresetn) ensures immediate response

  • Execution flags cleared to prevent spurious runs

FIFO Full/Empty Handling#

  • RX state machine waits for !ci2hbm_full, !ci2iep_full before writing

  • TX state machine waits for !txFIFO_full before writing

  • Spike collection waits for !spk2ciFIFO_empty before reading

Execution Limit Edge Case#

  • execRun_limit == 0: Single time step execution

  • Comparison execRun_ctr == execRun_limit correctly handles both cases

Axon Address Limit#

  • Prevents writing beyond allocated BRAM rows

  • Correctly handles non-multiple-of-16 input counts (rounds down via [16:4])


Future Enhancement Opportunities#

  1. Pipelined Command Processing: Currently processes one command at a time; could overlap execution with data loading

  2. Variable Spike Batch Size: Fixed 14-spike batches may be inefficient for low spike rates; adaptive sizing could reduce latency

  3. Compression: Sparse spike patterns could benefit from run-length encoding or similar compression

  4. Multi-Core Coordination: For N_cores implementation, add inter-core communication commands

  5. Error Reporting: Add status/error codes to TX packets for invalid commands or failed operations

  6. Performance Counters: Instrument critical paths (HBM access time, execution phase durations) for profiling


Document Version: 1.0 Last Updated: December 2025 Module File: command_interpreter.v Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/