input_data_handler.v#
Module Overview#
Purpose and Role in Stack#
The input_data_handler module acts as a BRAM arbiter, managing access to the shared Block RAM (BRAM) that stores axon/external event data. This module:
Arbitrates between two requesters:
Command interpreter (CI) - for host read/write access
External events processor (EEP) - for runtime axon event processing
Enforces priority: Command interpreter has higher priority than external events processor
Handles BRAM read latency: Implements 3-cycle pipeline to account for BRAM read delay
Routes responses back to appropriate requester with address passthrough
In the software/hardware stack:
Command Interpreter ──┐
├──► input_data_handler ──► BRAM (2^15 x 256-bit)
External Events │ (Arbiter) │
Processor ────┘ │
│
┌─────────────────────────────┘
│
Response Router
│
┌────────────┴─────────────┐
▼ ▼
Command Interpreter External Events Processor
(read response) (read response)
This module is essential for efficient BRAM utilization, allowing both configuration/debug access (via CI) and high-speed runtime processing (via EEP) to share the same memory resource.
Module Architecture#
High-Level Block Diagram#
input_data_handler
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────────────────────┐ │
│ │ Command Interpreter FIFO │ │
│ │ (Input: Local Read) │ │
CI→FIFO ────►│ ci2idp_dout[271:0] │ │
(local) │ [271] = R/W command │ │
empty/rden │ [270:256] = 15-bit address │ │
│ [255:0] = 256-bit data │ │
└───────────┬───────────────────┘ │
│ │
│ │ │
│ ┌───────────▼───────────────────┐ │
│ │ External Events Proc FIFO │ │
│ │ (Input: Local Read) │ │
EEP→FIFO ────►│ eep2idp_dout[14:0] │ │
(local) │ 15-bit address only │ │
empty/rden └───────────┬───────────────────┘ │
│ │ │
│ │ │
│ │ ┌───────▼─────────────────────────────┐ │
│ │ │ Priority Arbiter │ │
│ │ │ - CI has priority over EEP │ │
│ │ │ - Selects address source │ │
│ │ │ - Generates BRAM control signals │ │
│ │ └───────┬─────────────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌────────────────────────┐ │
│ │ │ BRAM Interface │ │
BRAM ◄───────┼───┤ addr[14:0] │ │
Interface │ │ din[255:0] (write data)│ │
(2^15 x 256) │ │ dout[255:0] (read data)│ │
│ │ wren (write enable) │ │
│ └────────┬───────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────────────────────────┐ │
│ │ │ 3-Cycle Read Pipeline │ │
│ │ │ (Compensates for BRAM latency) │ │
│ │ │ │ │
│ │ │ IDLE → WAIT_0 → WAIT_1 → │ │
│ │ │ → WAIT_2 → output │ │
│ │ │ │ │
│ │ └──────────┬───────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────────────────────────┐ │
│ │ │ Response Router │ │
│ │ │ - Directs read data to │ │
│ │ │ original requester │ │
│ │ │ - Includes address passthrough │ │
│ │ └──────┬─────────┬─────────────────┘ │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │ │ idp2ci │ │ idp2eep │ │
CI←FIFO ◄──────┤ FIFO │ │ FIFO │◄───────────EEP←FIFO │
(remote) │ (Output: │ │ (Output: │ (remote)
full/wren │ Remote) │ │ Remote) │ │
data └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────┘
Interface Specification#
Clock and Reset#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
225 MHz system clock |
|
Input |
1 |
Active-low synchronous reset |
Command Interpreter Interface#
Input FIFO (Local - CI to IDP):
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
Input FIFO empty flag |
|
Input |
272 |
Input FIFO data output |
|
Output (reg) |
1 |
Input FIFO read enable |
Data Format (ci2idp_dout[271:0]):
[271] = R/W command (0=read, 1=write)
[270:256] = 15-bit BRAM address
[255:0] = 256-bit write data
Output FIFO (Remote - IDP to CI):
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
Output FIFO full flag |
|
Output |
271 |
Output FIFO data input |
|
Output (reg) |
1 |
Output FIFO write enable |
Data Format (idp2ci_din[270:0]):
[270:256] = 15-bit BRAM address (echoed from request)
[255:0] = 256-bit read data
External Events Processor Interface#
Input FIFO (Local - EEP to IDP):
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
Input FIFO empty flag |
|
Input |
15 |
Input FIFO data output (address only) |
|
Output (reg) |
1 |
Input FIFO read enable |
Data Format (eep2idp_dout[14:0]):
[14:0] = 15-bit BRAM address (read request only)
Output FIFO (Remote - IDP to EEP):
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1 |
Output FIFO full flag |
|
Output |
271 |
Output FIFO data input |
|
Output (reg) |
1 |
Output FIFO write enable |
Data Format (idp2eep_din[270:0]):
[270:256] = 15-bit BRAM address (echoed from request)
[255:0] = 256-bit read data
BRAM Interface#
Signal |
Direction |
Width |
Description |
|---|---|---|---|
|
Output (reg) |
15 |
BRAM address (0 to 32,767) |
|
Output |
256 |
BRAM write data |
|
Output (reg) |
1 |
BRAM write enable |
|
Input |
256 |
BRAM read data (3-cycle latency) |
BRAM Specifications:
Depth: 32,768 rows (2^15)
Width: 256 bits per row
Total Size: 1 MB (32,768 × 256 bits = 8,388,608 bits)
Read Latency: 3 clock cycles
Write Latency: 1 clock cycle (synchronous write)
Detailed Logic Description#
Command Decoder#
localparam CMD_READ = 1'b0;
localparam CMD_WRITE = 1'b1;
wire command = ci2idp_dout[271]; // Extract R/W bit
State Machine#
States:
localparam [2:0] STATE_RESET = 3'd0;
localparam [2:0] STATE_IDLE = 3'd1;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_0 = 3'd2;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_1 = 3'd3;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_2 = 3'd4;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_0 = 3'd5;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_1 = 3'd6;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_2 = 3'd7;
State Transition Diagram:
┌──────────────┐
│ STATE_RESET │
└──────┬───────┘
│
▼
┌──────────────┐
┌───▶│ STATE_IDLE │◄────────────────┬─────────────────┐
│ │ (Arbitrate) │ │ │
│ └──┬───────┬───┘ │ │
│ │ │ │ │
│ !eep │ │ !ci │ │
│ empty│ │ empty │ │
│ │ │ │ │
│ │ └─ CMD_READ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ STATE_CI_WAIT_0 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ STATE_CI_WAIT_1 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ STATE_CI_WAIT_2 │ │
│ │ │ │ │
│ │ │!idp2ci_full │ │
│ │ └──────────────┘ │
│ │ │
│ │ CMD_WRITE │
│ └─(immediate pop)──────────────────────────────┘
│ │
│ ▼
│ STATE_EEP_WAIT_0
│ │
│ ▼
│ STATE_EEP_WAIT_1
│ │
│ ▼
│ STATE_EEP_WAIT_2
│ │
│ │!idp2eep_full
└───────┘
Priority Arbitration Logic#
IDLE State Behavior:
STATE_IDLE: begin
if (~eep2idp_empty) begin
// EEP has pending request
bram_addr = eep2idp_dout;
next_state = STATE_EEP_WAIT_BRAM_READ_0;
end else if (~ci2idp_empty) begin
// CI has pending request (higher priority)
bram_addr = ci2idp_dout[270:256]; // Extract 15-bit address
if (command==CMD_READ)
next_state = STATE_CI_WAIT_BRAM_READ_0;
else begin // CMD_WRITE
bram_wren = 1'b1;
ci2idp_rden = 1'b1;
next_state = STATE_IDLE; // Write completes immediately
end
end
end
Priority Rules:
CI Write: Highest priority, completes in 1 cycle (no wait states)
CI Read: High priority, 3-cycle wait for BRAM latency
EEP Read: Lower priority, serviced only when CI FIFO empty
No Starvation: EEP will eventually be serviced due to finite CI request rate
BRAM Read Pipeline (3-Cycle Latency)#
Cycle Breakdown:
Cycle 0: Request arrives in IDLE state
- bram_addr = address from FIFO
- Transition to WAIT_0
Cycle 1: STATE_WAIT_0
- BRAM internal pipeline stage 1
- bram_addr held stable
- Transition to WAIT_1
Cycle 2: STATE_WAIT_1
- BRAM internal pipeline stage 2
- bram_addr held stable
- Transition to WAIT_2
Cycle 3: STATE_WAIT_2
- bram_dout now valid
- Wait for output FIFO not full
- Write to output FIFO (wren pulse)
- Pop input FIFO (rden pulse)
- Transition to IDLE
EEP Read Example:
STATE_EEP_WAIT_BRAM_READ_0: begin
bram_addr = eep2idp_dout; // Hold address stable
next_state = STATE_EEP_WAIT_BRAM_READ_1;
end
STATE_EEP_WAIT_BRAM_READ_1: begin
bram_addr = eep2idp_dout;
next_state = STATE_EEP_WAIT_BRAM_READ_2;
end
STATE_EEP_WAIT_BRAM_READ_2: begin
bram_addr = eep2idp_dout;
if (~idp2eep_full) begin
idp2eep_wren = 1'b1; // Write read data to output FIFO
eep2idp_rden = 1'b1; // Pop request from input FIFO
next_state = STATE_IDLE;
end
// else: stall until output FIFO has space
end
CI Read: Same pattern using ci2idp_dout[270:256] for address and idp2ci FIFOs.
Output Data Routing#
Assignments:
assign idp2eep_din = {bram_addr, bram_dout}; // [270:256]=addr, [255:0]=data
assign idp2ci_din = {bram_addr, bram_dout};
assign bram_din = ci2idp_dout[255:0]; // Only CI can write
Address Passthrough:
Read responses include the original address
Allows requester to correlate response with request
Critical for pipelined operation (though this module doesn’t pipeline)
Timing Diagrams#
CI Write Transaction#
Cycle: 0 1 2
│ │ │
State IDLE │IDLE │
│ │ │
ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│ (WR, Addr=0x1234, Data=0xABCD...)
_empty │ │ │
│ │ │
ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden │ │ │
│ │ │
bram_addr XXXX │0x1234│
│ │ │
bram_wren ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
│ │ │
bram_din XXXX │0xABCD│
│ │... │
Notes:
Single-cycle write operation
No wait states required
Returns to IDLE immediately
CI Read Transaction#
Cycle: 0 1 2 3 4 5
│ │ │ │ │ │
State IDLE │WAIT_0│WAIT_1│WAIT_2│IDLE │
│ │ │ │ │ │
ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│ (RD, Addr=0x5678)
_empty │ │ │ │ │ │
│ │ │ │ │ │
ci2idp ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden │ │ │ │ │ │
│ │ │ │ │ │
bram_addr XXXX │0x5678│0x5678│0x5678│0x5678│
│ │ │ │ │ │
bram_dout XXXX │XXXX │XXXX │XXXX │DATA │
│ │ │ │ │ │
idp2ci ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_wren │ │ │ │ │ │
│ │ │ │ │ │
idp2ci_din XXXX │XXXX │XXXX │XXXX │{0x5678,
│ │ │ │ │ DATA}
Notes:
3-cycle wait for BRAM read latency
Address held stable during wait states
Response includes address + data
Priority Arbitration: EEP Deferred#
Cycle: 0 1 2 3 4 5 6 7 8
│ │ │ │ │ │ │ │ │
State IDLE │WAIT_0│WAIT_1│WAIT_2│IDLE │WAIT_0│WAIT_1│WAIT_2│
│ │ │ │ │ │ │ │ │
eep2idp ▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│ (pending request)
_empty │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │
ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ (higher priority)
_empty │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │
Serviced - │CI │CI │CI │CI │EEP │EEP │EEP │EEP
│ │ │ │ │ │ │ │ │
Notes:
Cycle 0: Both FIFOs have requests, CI serviced first
Cycles 1-4: CI read completes (3-cycle wait)
Cycle 5: EEP request now serviced
Demonstrates priority enforcement
Cross-References#
BRAM Structure (Parent: pcie2fifos → command_interpreter)#
Data Stored in BRAM:
Axon/External Event Data
Each row: 256 bits = 16 × 16-bit masks (one per neuron group)
Row address: Axon ID / 16
Example Row at Address 0x1000:
Bits [255:240] = Mask for neuron group 15
Bits [239:224] = Mask for neuron group 14
...
Bits [31:16] = Mask for neuron group 1
Bits [15:0] = Mask for neuron group 0
Each 16-bit mask: One bit per neuron group indicating which received axon spike
Key Terms and Definitions#
Term |
Definition |
|---|---|
Arbiter |
Logic that decides which requester gains access to shared resource |
Priority |
CI requests serviced before EEP when both pending |
Read Latency |
3 clock cycles from address presentation to valid data |
Passthrough |
Address echoed back with read data for correlation |
Local FIFO |
FIFO in same clock domain as module (input side) |
Remote FIFO |
FIFO potentially in different clock domain (output side) |
CMD_READ |
Command bit value 0, triggers read transaction |
CMD_WRITE |
Command bit value 1, triggers write transaction |
BRAM |
Block RAM - On-chip synchronous memory primitive |
FIFO Backpressure |
Waiting for output FIFO not full before writing |
Performance Characteristics#
Throughput#
Best Case (No Contention):
CI Write: 1 operation per clock cycle = 225 MHz = 225M writes/sec
CI Read: 4 cycles per operation (1 IDLE + 3 WAIT) = 56.25M reads/sec
EEP Read: 4 cycles per operation = 56.25M reads/sec (when CI idle)
Worst Case (Contention):
EEP Read (with CI active): Indefinitely deferred until CI idle
CI Read (with output FIFO full): Stalled in WAIT_2 state
Realistic (Mixed Workload):
CI accesses: Infrequent (configuration, debug)
EEP accesses: Burst during Phase 1 execution
Typical: EEP dominates, achieving ~50M reads/sec effective rate
Latency#
Operation |
Latency (Cycles) |
Latency (ns @ 225 MHz) |
Notes |
|---|---|---|---|
CI Write |
1 |
4.4 ns |
Immediate, no wait |
CI Read |
4 |
17.8 ns |
1 IDLE + 3 WAIT |
EEP Read |
4 |
17.8 ns |
When CI idle |
EEP Read (deferred) |
4 + CI latency |
Variable |
Must wait for CI completion |
Stall Conditions#
Input Side Stalls:
None - FIFOs assumed to handle backpressure
Output Side Stalls:
WAIT_2 State: If output FIFO full, module holds until space available
Impact: Backpressure propagates to input FIFO (requesters must wait)
Design Considerations#
Why Priority to CI?#
Low Frequency: CI accesses are rare (host-initiated)
Latency Sensitive: Host expects fast response for debug/config
No Starvation: EEP can afford to wait a few cycles
Simplicity: Avoids complex round-robin or fair arbitration
Why 3-Cycle Wait?#
BRAM Primitive: Xilinx Block RAM has inherent 2-3 cycle read latency
Pipeline Registers: Additional registering for timing closure
Fixed Latency: Simplifies state machine design (no variable wait)
Alternative Designs#
Round-Robin Arbitration:
Pros: Fair access, prevents EEP starvation
Cons: More complex, CI latency increases
Pipelined Operation:
Pros: Higher throughput (overlapped requests)
Cons: Requires buffering, address tracking, out-of-order handling
Not needed: Current design adequate for workload
Common Issues and Debugging#
Problem: EEP Never Gets Access#
Symptoms: EEP input FIFO fills up, no reads complete
Debug Steps:
Check
ci2idp_empty- should toggle to 1 occasionallyCheck state machine - should eventually reach
STATE_EEP_WAIT_0Verify CI not continuously sending requests
Common Cause: CI stuck in continuous read/write loop
Problem: Read Data Incorrect#
Symptoms: Returned data doesn’t match expected values
Debug Steps:
Check
bram_addrduring WAIT states - should be stableVerify
bram_douton cycle 3 (WAIT_2 state)Confirm write operations completed before read
Check address calculation in requester module
Common Cause: Address mismatch or read-before-write hazard
Problem: Module Stuck in WAIT_2#
Symptoms: State machine doesn’t return to IDLE
Debug Steps:
Check output FIFO full flag (
idp2ci_fulloridp2eep_full)Verify downstream module consuming from output FIFO
Check for clock domain crossing issues (if FIFOs are async)
Common Cause: Output FIFO overflow or downstream stall
VIO/ILA Probes (Recommended)#
(*mark_debug = "true"*) reg [2:0] curr_state;
(*mark_debug = "true"*) wire command = ci2idp_dout[271];
(*mark_debug = "true"*) wire [14:0] ci_addr = ci2idp_dout[270:256];
(*mark_debug = "true"*) wire [14:0] eep_addr = eep2idp_dout;
(*mark_debug = "true"*) wire ci_request = ~ci2idp_empty;
(*mark_debug = "true"*) wire eep_request = ~eep2idp_empty;
(*mark_debug = "true"*) wire [14:0] bram_addr;
(*mark_debug = "true"*) wire bram_wren;
Safety and Edge Cases#
Reset Behavior#
On resetn deassertion:
State machine →
STATE_RESET→STATE_IDLEAll output signals → 0 (no spurious FIFO operations)
BRAM address →
15'dX(don’t care)
Simultaneous Requests#
Both FIFOs have data at IDLE state:
CI serviced first (priority)
EEP serviced after CI completes
Write During Read:
Write completes in 1 cycle
Subsequent read sees updated value (BRAM write latency = 1 cycle)
FIFO Full During WAIT_2#
Module stalls in WAIT_2 state
bram_addrheld stable (safe to stall)No timeout - waits indefinitely for FIFO space
Assumes downstream will eventually consume
Potential Enhancements#
Pipelined Reads: Allow new request while waiting for previous read
Requires FIFO buffering and address tracking
Could double read throughput
Write Acknowledgment: Provide write confirmation to CI
Currently fire-and-forget
Useful for verification
Round-Robin or Weighted Arbitration: Fairer access to EEP
Prevent worst-case starvation scenarios
At cost of CI latency
Variable BRAM Latency: Support configurable wait cycles
Adapt to different BRAM configurations
Requires parameterization
Performance Counters: Track utilization and contention
CI access count
EEP access count
Stall cycles
Useful for profiling
Error Detection: Detect protocol violations
Write with read-pending
Address out of range
Currently no error reporting
Document Version: 1.0
Last Updated: December 2025
Module File: input_data_handler.v
Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/
Purpose: BRAM arbiter for shared axon/external event memory
BRAM Size: 1 MB (2^15 × 256-bit)
Read Latency: 3 cycles