command_interpreter.v#
Module Overview#
Purpose and Role in Stack#
The command_interpreter module serves as the central command router and execution controller for the neuromorphic FPGA core. It acts as the primary interface between the host computer (via PCIe) and all internal processing modules. This module:
Decodes and routes PCIe commands to appropriate processors (HBM, external events, internal events)
Manages network execution flow including time-step sequencing and execution counters
Handles axon event distribution by parsing 512-bit PCIe packets into individual axon events
Collects and batches spike outputs for transmission back to the host
Maintains execution statistics including time-step counters and FPGA cycle timers
In the software/hardware stack:
Host (Python hs_bridge) → PCIe DMA → pcie2fifos → command_interpreter → Processing Modules
↓
[External Events Processor]
[HBM Processor]
[Internal Events Processor]
Module Architecture#
High-Level Block Diagram#
┌────────────────────────────────────────────────┐
│ command_interpreter │
│ │
PCIe RX FIFO │ ┌──────────────────────────────────────┐ │
512-bit │ │ RX State Machine │ │
────────────────┼──► Command Decoder & Router │ │
│ │ - CMD_EEP_W: Load axon events │ │
│ │ - CMD_HBM_RW: R/W synapses │ │
│ │ - CMD_IEP_RW: R/W neurons │ │
│ │ - CMD_NTWK_PARAM_W: Set params │ │
│ │ - CMD_EXEC_STEP: Run 1 timestep │ │
│ │ - CMD_EXEC_CONT: Continuous run │ │
│ └────┬─────────────┬──────────┬────────┘ │
│ │ │ │ │
│ ┌───▼────┐ ┌───▼────┐ ┌─▼────────┐ │
│ │ Axon │ │ HBM │ │ Internal │ │
│ │ Event │ │ FIFO │ │ Events │ │
│ │ Shifter│ │ ci2hbm │ │ FIFO │ │
External Events │ └────┬───┘ └───┬────┘ │ ci2iep │ │
Processor │ │ │ └─┬────────┘ │
◄───────────────┼────────┘ │ │ │
│ │ │ │
HBM Processor │ │ │ │
◄───────────────┼────────────────────┘ │ │
│ │ │
Internal Events │ │ │
Processor │ │ │
◄───────────────┼──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ TX State Machine │ │
PCIe TX FIFO │ │ Spike Collection & Batching │ │
512-bit │ │ - Batches 14 spikes per packet │◄───┼── Spike FIFO
◄───────────────┼──┤ - Includes timestamp (execRun_ctr) │ │ (from spike_
│ │ - Opcode: 0xEEEE_EEEE │ │ fifo_controller)
│ └──────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ Execution Control Registers │ │
│ │ - execRun_ctr (time-step counter) │ │
│ │ - execRun_limit (max timesteps) │ │
│ │ - execRun_timer (FPGA cycle count) │ │
│ │ - execRun_running / execRun_done │ │
│ └──────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
Interface Specification#
Module Parameters#
Parameter |
Width |
Default |
Description |
|---|---|---|---|
|
- |
32 |
AXI address width (currently unused) |
|
- |
32 |
AXI data width (currently unused) |
|
- |
33 |
HBM address width (8 GB addressable) |
|
- |
256 |
HBM data width |
|
- |
32 |
HBM bytes per transaction (256 bits / 8) |
Clock and Reset#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
225 MHz system clock |
|
Input |
1 |
Active-low asynchronous reset |
PCIe Interface (via pcie2fifos)#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
RX FIFO (Host → Card) |
|||
|
Input |
1 |
RX FIFO empty flag |
|
Input |
512 |
RX FIFO data output |
|
Output |
1 |
RX FIFO read enable |
TX FIFO (Card → Host) |
|||
|
Input |
1 |
TX FIFO full flag |
|
Output |
512 |
TX FIFO data input |
|
Output |
1 |
TX FIFO write enable |
PCIe Data Format:
RX FIFO (512-bit packet from host):
[511:504] = 8-bit command opcode
[503:0] = Command-specific data payload
TX FIFO (512-bit packet to host):
Spike packet:
[511:480] = 0xEEEE_EEEE (spike opcode)
[479:32] = 14 × 32-bit spike events (448 bits)
Each spike: [31:24]=sub-timestamp, [23]=flag, [22:16]=zeros, [15:0]=neuron addr
[31:0] = execRun_ctr (timestep counter)
HBM read response:
[511:496] = 0xBBBB
[495:256] = zeros (240 bits)
[255:0] = HBM data
Neuron read response:
[511:496] = 0xCCCC
[495:53] = zeros (443 bits)
[52:0] = Neuron data (36-bit value + 17-bit address)
External Events Processor Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Output |
1 |
Axon event valid/write enable |
|
Output |
13 |
Row address in BRAM (0 to 8191) |
|
Output |
16 |
Data mask (1 bit per neuron group) |
Address Calculation:
16 neuron groups → 16 axons per row
num_inputs[16:4]= number of rows (ignoring lower 4 bits)Each 512-bit PCIe packet contains 512/16 = 32 axon events
HBM Processor Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
Command Interface (CI → HBM) |
|||
|
Input |
1 |
Command FIFO full flag |
|
Output |
280 |
Command data |
|
Output |
1 |
Command write enable |
Response Interface (HBM → CI) |
|||
|
Input |
1 |
Response FIFO empty flag |
|
Input |
256 |
Response data |
|
Output |
1 |
Response read enable |
Command Format (ci2hbm_din[279:0]):
[279] = R/W command (0=read, 1=write)
[278:256] = 23-bit row address
[255:0] = 256-bit data (for writes)
Internal Events Processor Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
Command Interface (CI → IEP) |
|||
|
Input |
1 |
Command FIFO full flag |
|
Output |
54 |
Command data |
|
Output |
1 |
Command write enable |
Response Interface (IEP → CI) |
|||
|
Input |
1 |
Response FIFO empty flag |
|
Input |
53 |
Response data |
|
Output |
1 |
Response read enable |
Command Format (ci2iep_din[53:0]):
[53] = R/W command (0=read, 1=write)
[52:36] = 17-bit neuron address (128K neurons addressable)
[35:0] = 36-bit neuron data (membrane potential or other state)
Note: Comments in code show previous 16-bit data format; upgraded to 36-bit.
Spike Event Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
17 |
Spiked neuron address |
|
Input |
1 |
Spike FIFO empty flag |
|
Output |
1 |
Spike FIFO read enable |
Network Execution Control#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
Internal events processor phase 2 completion flag |
|
Output |
1 |
Execute time-step command (1-cycle pulse) |
|
Output |
1 |
Execution in progress flag |
|
Output |
1 |
Execution completed flag |
|
Output |
32 |
Maximum time steps (0 = single step) |
|
Output |
32 |
Current time-step counter |
|
Output |
64 |
FPGA clock cycle counter during execution |
Network Parameters#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Output |
17 |
Number of input axons (131,072 max) |
|
Output |
17 |
Number of output neurons to monitor |
|
Output |
36 |
Neuron spike threshold (signed) |
|
Output |
2 |
Neuron model selection (0-3) |
Debug Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Output |
3 |
RX state machine state (for VIO debugging) |
|
Output |
2 |
TX state machine state (for VIO debugging) |
Detailed Logic Description#
Command Opcodes#
localparam [7:0] CMD_EEP_W = 8'd1; // Write axon events to BRAM
localparam [7:0] CMD_HBM_RW = 8'd2; // Read/write HBM synapse data
localparam [7:0] CMD_IEP_RW = 8'd3; // Read/write neuron state
localparam [7:0] CMD_NTWK_PARAM_W = 8'd4; // Write network parameters
localparam [7:0] CMD_EXEC_STEP = 8'd6; // Execute single time step
localparam [7:0] CMD_EXEC_CONT = 8'd7; // Execute continuous (multiple timesteps)
Command extracted from: rx_command = rxFIFO_dout[511:504]
RX State Machine#
States:
RX_STATE_RESET (3'd0) - Reset state
RX_STATE_IDLE (3'd1) - Wait for commands
RX_STATE_REGISTER_PCIE_AXON_DATA (3'd2) - Register 512-bit axon packet
RX_STATE_SET_AXON_DATA (3'd3) - Shift and distribute axon events
RX_STATE_EXEC_STEP (3'd4) - Execute time step
RX_STATE_WAIT_RUN (3'd5) - Wait for execution completion
RX_STATE_EXEC_DONE (3'd6) - Execution finished
State Transition Diagram:
┌─────────────┐
│ RX_RESET │
└──────┬──────┘
│
v
┌───────────────────────────────────────────┐
│ RX_IDLE │
│ Wait for rxFIFO commands │
└──┬──┬──┬──┬──┬──────────────────────────┬─┘
│ │ │ │ │ │
EEP_W│ │ │ │ │EXEC_STEP │
───┼──┼──┼──┼──┼── │
│ │ │ │ │ │ │
│ │ │ │ v v │
│ │ │ │ RX_EXEC_STEP │
│ │ │ │ │ │
│ │ │ │ v │
│ │ │ │ RX_WAIT_RUN │
│ │ │ │ (wait_clks_cnt) │
│ │ │ │ │ │
│ │ │ │ ├─ ctr<limit ──────────┘
│ │ │ │ │
│ │ │ │ └─ ctr==limit
│ │ │ │ │
│ │ │ │ v
│ │ │ │ RX_EXEC_DONE
│ │ │ │ │
│ │ │ └────────────┴────────────┐
│ │ │ │
│ │ │HBM_RW / IEP_RW / NTWK_W │
│ │ └───(immediate)──────────────┤
│ │ │
│ │EXEC_CONT │
v v v
RX_REGISTER_PCIE_AXON_DATA RX_IDLE
│
v
RX_SET_AXON_DATA
(shift & increment)
│
├─ addr[4:0]==31 ─────> loop to REGISTER
│
└─ addr==limit ───┬─ !running ──> RX_IDLE
│
└─ running ──> RX_EXEC_STEP
Key Logic:
CMD_EEP_W (Write Axon Events):
Resets
axonEvent_addrto 0Fetches 512-bit packet into
axon_data_srShifts out 16 bits at a time, incrementing address
After every 32 events (one packet), fetches next packet
Continues until
axonEvent_addr == num_inputs[16:4]
CMD_HBM_RW / CMD_IEP_RW:
Direct passthrough to respective FIFOs
Waits for FIFO not full before writing
Returns to IDLE immediately
CMD_NTWK_PARAM_W:
Writes network parameters from PCIe packet:
num_inputs[16:0] = rxFIFO_dout[16:0] num_outputs[16:0] = rxFIFO_dout[33:17] threshold[35:0] = rxFIFO_dout[69:34] exec_neuron_model[1:0] = rxFIFO_dout[71:70]
CMD_EXEC_STEP / CMD_EXEC_CONT:
Resets execution counters and timer
CMD_EXEC_CONT sets
execRun_limitfrom packet[31:0]Loads axon events (if EXEC_CONT) then executes
Waits for
exec_iep_phase2_donesignalAdditional 31-cycle wait to ensure all spikes collected
Repeats until
execRun_ctr == execRun_limit
TX State Machine#
States:
TX_STATE_RESET (2'd0) - Reset state
TX_STATE_IDLE (2'd1) - Check execution status & FIFOs
TX_STATE_WAIT_FOR_SPIKES (2'd2) - Collect spikes during execution
TX_STATE_SEND_SPIKES (2'd3) - Send batched spikes to host
State Transition Diagram:
┌─────────────┐
│ TX_RESET │
└──────┬──────┘
│
v
┌────────────────────────────────┐
│ TX_IDLE │
│ Check execution & FIFOs │
└──┬──────┬──────────────────────┘
│ │
execRun │ │ !execRun & !txFIFO_full
──────────┘ └───────┬──────────────
│ │
v ├─ !hbm2ci_empty ──> send HBM data ──┐
TX_WAIT_FOR_SPIKES │ │
Collect spikes └─ !iep2ci_empty ──> send IEP data ─┤
│ │
├─ spike_ctr==14 ──────────────────────────────────┤
│ │
├─ !spk2ciFIFO_empty ─> spike_inc (read & shift) │
│ │
└─ execRun_done ────┬─ spikes_sent ────────────────┤
│ │
└─ !spikes_sent │
│ │
v │
TX_SEND_SPIKES │
(batch of 14) │
│ │
└───────────────────────┘
│
v
TX_IDLE
Spike Collection Logic:
Spikes stored in 448-bit shift register
spike_srCounter
spike_ctrtracks number of spikes (max 14)Each spike formatted as 32 bits:
[31:24] = execRun_ctr[7:0] (sub-timestamp) [23] = 1'b1 (valid flag) [22:16] = 7'd0 (padding) [15:0] = spk2ciFIFO_dout (17-bit neuron address truncated?)
Note: Bit allocation seems inconsistent (17-bit addr in 16 bits + 1 flag?)
Batch sent when:
spike_ctr == 14, ORexecRun_doneand spikes pending
Final packet format (512 bits):
[511:480] = 32'hEEEE_EEEE (opcode) [479:32] = spike_sr[447:0] (14 spikes) [31:0] = execRun_ctr (timestamp)
Axon Event Shifter#
Purpose: Convert 512-bit PCIe packets into sequential 16-bit axon events.
Registers:
axon_data_sr[511:0]- Shift register holding PCIe packetaxonEvent_addr[12:0]- Current row address (0 to 8191)
Operation:
Load:
axon_data_setloadsrxFIFO_doutintoaxon_data_srShift & Increment:
axon_addr_inctriggers:axonEvent_addr <= axonEvent_addr + 1'b1; axon_data_sr <= {16'd0, axon_data_sr[511:16]}; // Right shift 16 bits
Output:
axonEvent_data = axon_data_sr[15:0](LSBs)Limit:
axon_addr_limit = num_inputs[16:4]Divide by 16 since each row handles 16 axons
Reload: After 32 shifts (one packet exhausted), fetch next packet
Timeline Example:
Cycle 0: Load packet → axon_data_sr = [511:0]
Cycle 1: Shift #1 → axonEvent_addr=0, data=sr[15:0]
Cycle 2: Shift #2 → axonEvent_addr=1, data=sr[31:16] (original)
...
Cycle 32: Shift #32 → axonEvent_addr=31, data=sr[511:496] (original)
Cycle 33: Load next packet
Execution Control Registers#
Controlled Signals:
execRun_limit[31:0]:
Set by
exec_run_setfromrxFIFO_dout[31:0]Reset by
exec_run_rst(when not simultaneously setting)Value of 0 means single time step
execRun_ctr[31:0]:
Reset by
exec_run_rstIncremented by
exec_run_incRepresents current time step
execRun_timer[63:0]:
Reset by
exec_run_rstIncrements every cycle while
execRun_running==1Provides FPGA cycle count for performance measurement
execRun_running:
Set by
exec_run_rstCleared by
exec_run_done
execRun_done:
Set by
exec_run_doneCleared by
exec_run_rst
Control Flow:
exec_run_rst ──┬──> execRun_ctr = 0
├──> execRun_timer = 0
├──> execRun_running = 1
└──> execRun_done = 0
exec_run_set ──> execRun_limit = rxFIFO_dout[31:0]
exec_run_inc ──> execRun_ctr++
execRun_running ──> execRun_timer++ (every cycle)
exec_run_done ──┬──> execRun_running = 0
└──> execRun_done = 1
Wait Clock Counter#
Purpose: Ensure all intermediate spikes have been transmitted before advancing to next time step.
Register: wait_clks_cnt[4:0]
Limit: 31 cycles (5’d31)
Logic:
if ((rx_curr_state==RX_STATE_WAIT_RUN) &
exec_iep_phase2_done &
spk2ciFIFO_empty)
wait_clks_cnt <= wait_clks_cnt + 1'b1;
else
wait_clks_cnt <= 5'd0;
Rationale:
Spike FIFO controller uses round-robin across 8 FIFOs
Up to 8 cycles for a spike to propagate to
spk2ciFIFO31-cycle wait provides safety margin (4× worst case)
Only starts counting when phase 2 done AND spike FIFO empty
Memory Map#
Network Parameter Registers#
These parameters are configured via CMD_NTWK_PARAM_W command:
Register |
Bits |
Address in PCIe Packet |
Description |
|---|---|---|---|
|
17 |
[16:0] |
Number of input axons (0 to 131,071) |
|
17 |
[33:17] |
Number of output neurons to monitor |
|
36 (signed) |
[69:34] |
Neuron spike threshold |
|
2 |
[71:70] |
Neuron model selection |
Neuron Model Encoding:
2'b00 = Model 0 (e.g., Leaky Integrate-and-Fire)
2'b01 = Model 1 (e.g., Izhikevich)
2'b10 = Model 2 (e.g., Hodgkin-Huxley approximation)
2'b11 = Model 3 (reserved/custom)
Note: Actual model semantics defined in internal_events_processor
Timing Diagrams#
CMD_EEP_W: Writing Axon Events#
Cycle: 0 1 2 3 4 5 6 7 8 9
│ │ │ │ │ │ │ │ │ │
aclk ──┘▔▔▔▔▔▔└┐ ┌▔┐ ┌▔┐ ┌▔┐ ┌▔┐ ┌▔┐ ┌▔┐ ┌▔┐ ┌▔
│ │▔▔▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│▔│▔▔▔▔│
State IDLE │REG_D │SET_D │SET_D │SET_D │SET_D │SET_D │SET_D │SET_D │
│ │ │ │ │ │ │ │ │ │
rxFIFO ▔▔CMD▔▔▔X▔▔PKT1▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
_dout │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
rxFIFO ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_rden │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
axon_data ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_set │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
axon_addr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
_inc │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
axonEvent ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│
_set │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
axonEvent 0x0000 │0x0000 │D[15:0│D[31:16│D[47:32│D[63:48│D[79:64│D[95:80│
_data │ │ │] │] │] │] │] │] │
│ │ │ │ │ │ │ │ │ │
axonEvent 0 │0 │0 │1 │2 │3 │4 │5 │6 │
_addr │ │ │ │ │ │ │ │ │ │
Notes:
IDLE state: Command detected (CMD_EEP_W)
REG_D state: Register 512-bit PCIe packet
SET_D state: Shift out 16 bits per cycle, increment address
After 32 shifts, return to REG_D to fetch next packet
CMD_EXEC_STEP: Single Time-Step Execution#
Cycle: 0 1 2 3 ... N N+1 N+2 ... N+32 N+33 N+34
│ │ │ │ │ │ │ │ │ │ │
State IDLE │EXEC_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │WAIT_ │IDLE │
│ │STEP │RUN │RUN │RUN │RUN │RUN │RUN │RUN │ │
│ │ │ │ │ │ │ │ │ │ │
exec_run ▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
│ │ │ │ │ │ │ │ │ │ │
execRun ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁
_running │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │
exec_iep ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁
_phase2 │ │ │ │ │ │ │ │ │ │ │
_done │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │
spk2ci ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔
FIFO_empty │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │
wait_clks 0 │0 │0 │0 │0 │0 │1 │2 │31 │0
_cnt │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │
execRun 0 │0 │0 │0 │0 │0 │0 │0 │0 │0
_timer │ │ │ │ │ │N │N+1 │N+2 │N+32 │N+32 │
Notes:
N cycles: Time for external + internal event processing
Phase 2 done: Internal events processor completes neuron updates
31-cycle wait: Ensures all spikes collected before next time step
Single time step:
execRun_limit=0, no increment ofexecRun_ctr
Spike Batching and Transmission#
Cycle: 0 1 2 3 ... 14 15 16 17 18
│ │ │ │ │ │ │ │ │ │
TX State WAIT │WAIT │WAIT │WAIT │WAIT │SEND │WAIT │WAIT │WAIT │
_SPIKES│_SPKS │_SPKS │_SPKS │_SPKS │_SPKS │_SPIKES│_SPKS │_SPKS │
│ │ │ │ │ │ │ │ │ │
spk2ciFIFO ▔▔▔▔▔▔▔│▁▁▁▁▁▁│▔▔▔▔▔▔│▁▁▁▁▁▁│▁▁▁▁▁▁│▁▁▁▁▁▁│▁▁▁▁▁▁│▔▔▔▔▔▔│▁▁▁▁▁▁│
_empty │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
spk2ciFIFO ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
_rden │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
spike_inc ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁
│ │ │ │ │ │ │ │ │ │
spike_ctr 0 │1 │1 │2 │2 │14 │0 │1 │1
│ │ │ │ │ │ │ │ │ │
spike_sr XXXX │SPK1 │SPK1 │SPK2 │SPK2 │SPK14 │0 │SPK1' │SPK1' │
│ │ │ │ │ │ │ │ │ │
txFIFO ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_wren │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
txFIFO XXXXXX │XXXXXX│XXXXXX│XXXXXX│XXXXXX│0xEEEE│XXXXXX│XXXXXX│XXXXXX│
_din │ │ │ │ │ │_EEEE │ │ │ │
│ │ │ │ │ │+14SPK│ │ │ │
Notes:
Spikes collected as they arrive (non-empty FIFO)
After 14 spikes: Transition to SEND_SPIKES
Packet sent with opcode 0xEEEE_EEEE
Counter and shift register reset after send
Process repeats for next batch
Cross-References#
Software Integration#
Python (hs_bridge) Functions:
fpga_controller.write_axon_events()→ SendsCMD_EEP_Wcommandsfpga_controller.read_hbm()/write_hbm()→ SendsCMD_HBM_RWcommandsfpga_controller.read_neuron()/write_neuron()→ SendsCMD_IEP_RWcommandsfpga_controller.set_network_params()→ SendsCMD_NTWK_PARAM_Wcommandfpga_controller.execute_step()→ SendsCMD_EXEC_STEPcommandfpga_controller.execute_continuous()→ SendsCMD_EXEC_CONTcommandfpga_controller.read_spikes()→ Reads TX FIFO for spike packets (opcode 0xEEEE_EEEE)
Key Terms and Definitions#
Term |
Definition |
|---|---|
Command Opcode |
8-bit identifier in PCIe packet |
Axon Event |
External input spike represented as row address + 16-bit mask |
Time Step |
One iteration of network simulation (external events → internal updates) |
execRun_ctr |
Time-step counter, incremented after each iteration |
execRun_timer |
FPGA clock cycle counter for performance profiling |
Spike Batching |
Grouping 14 spikes into single 512-bit PCIe packet for efficiency |
Sub-timestamp |
Lower 8 bits of |
Round-Robin |
Fair scheduling used in spike FIFO controller (external to this module) |
Wait Clock Counter |
31-cycle delay ensuring all spikes transmitted before next time step |
Shift Register |
|
Phase 2 |
Internal events processor phase updating neuron states |
FWFT |
First-Word Fall-Through FIFO mode (used in pcie2fifos) |
Design Evolution and Commented Code#
Evidence of Scaling (8 → 16 Neuron Groups)#
Axon Event Data Width:
// OLD (8 groups): output [7:0] axonEvent_data
// NEW (16 groups): output [15:0] axonEvent_data
// OLD shift: axon_data_sr <= {8'd0, axon_data_sr[511:8]};
// NEW shift: axon_data_sr <= {16'd0, axon_data_sr[511:16]};
Address Calculation:
// OLD: wire [13:0] axon_addr_limit = num_inputs[16:3]; // 8 axons per row
// NEW: wire [12:0] axon_addr_limit = num_inputs[16:4]; // 16 axons per row
Shift Detection:
// OLD: if (axonEvent_addr[5:0]==6'd63) // 64 x 8-bit events per packet
// NEW: if (axonEvent_addr[4:0]==5'd31) // 32 x 16-bit events per packet
Neuron Data Width Upgrade#
Internal Events Processor Interface:
// OLD (16-bit neuron data):
// output [1+17+15:0] ci2iep_din // 34 bits total
// input [17+15:0] iep2ci_dout // 33 bits total
// NEW (36-bit neuron data):
output [1+17+35:0] ci2iep_din // 54 bits total
input [17+35:0] iep2ci_dout // 53 bits total
This suggests upgrade to higher-precision membrane potentials or additional neuron state variables.
Performance Considerations#
Throughput#
Axon Event Loading:
32 events per 512-bit PCIe packet
1 event per FPGA cycle (225 MHz)
32 cycles to exhaust packet + 1-2 cycles fetch overhead
Throughput: ~6.75M axon events/second per core
Spike Output:
14 spikes per 512-bit PCIe packet
Batching amortizes packet overhead
Throughput: Depends on spike rate; typically 1-10% of neurons spike per time step
Time Step Execution:
Variable duration based on:
Number of active axons (external events phase)
HBM access latency for synapse fetching
Number of neurons to update (internal events phase)
31-cycle safety margin for spike collection
Typical: 1000-10,000 FPGA cycles per time step
Latency#
Command Response:
HBM read: ~100-200 ns (HBM latency + processing)
Neuron read: ~50-100 ns (URAM access + FIFO transfer)
Execution Completion:
Minimum: ~5 µs (1000 cycles @ 225 MHz)
Typical: ~20-50 µs depending on network activity
Debugging and Verification#
VIO Signals#
output [2:0] vio_rx_curr_state, // Monitor RX state machine
output [1:0] vio_tx_curr_state, // Monitor TX state machine
State Encodings for Debugging:
RX States:
3'd0 = RESET
3'd1 = IDLE
3'd2 = REGISTER_PCIE_AXON_DATA
3'd3 = SET_AXON_DATA
3'd4 = EXEC_STEP
3'd5 = WAIT_RUN
3'd6 = EXEC_DONE
TX States:
2'd0 = RESET
2'd1 = IDLE
2'd2 = WAIT_FOR_SPIKES
2'd3 = SEND_SPIKES
Common Debugging Scenarios#
Problem: Network doesn’t execute
Check:
execRun_runningshould assert afterCMD_EXEC_*commandCheck:
exec_iep_phase2_doneshould eventually assertCheck:
wait_clks_cntshould count to 31
Problem: Spikes not received by host
Check:
spk2ciFIFO_empty- should toggle during executionCheck:
spike_ctr- should increment when spikes detectedCheck:
txFIFO_wren- should pulse when batches sent
Problem: Axon events not loaded
Check:
rxFIFO_dout[511:504]==CMD_EEP_W(8’d1)Check:
axonEvent_addrshould increment from 0 tonum_inputs[16:4]Check:
axonEvent_setshould pulse for each event
Safety and Edge Cases#
Reset Behavior#
All counters and state machines reset to safe states
Asynchronous reset (
~aresetn) ensures immediate responseExecution flags cleared to prevent spurious runs
FIFO Full/Empty Handling#
RX state machine waits for
!ci2hbm_full,!ci2iep_fullbefore writingTX state machine waits for
!txFIFO_fullbefore writingSpike collection waits for
!spk2ciFIFO_emptybefore reading
Execution Limit Edge Case#
execRun_limit == 0: Single time step executionComparison
execRun_ctr == execRun_limitcorrectly handles both cases
Axon Address Limit#
Prevents writing beyond allocated BRAM rows
Correctly handles non-multiple-of-16 input counts (rounds down via
[16:4])
Future Enhancement Opportunities#
Pipelined Command Processing: Currently processes one command at a time; could overlap execution with data loading
Variable Spike Batch Size: Fixed 14-spike batches may be inefficient for low spike rates; adaptive sizing could reduce latency
Compression: Sparse spike patterns could benefit from run-length encoding or similar compression
Multi-Core Coordination: For N_cores implementation, add inter-core communication commands
Error Reporting: Add status/error codes to TX packets for invalid commands or failed operations
Performance Counters: Instrument critical paths (HBM access time, execution phase durations) for profiling
Document Version: 1.0
Last Updated: December 2025
Module File: command_interpreter.v
Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/