# input_data_handler.v ## Module Overview ### Purpose and Role in Stack The **input_data_handler** module acts as a **BRAM arbiter**, managing access to the shared Block RAM (BRAM) that stores axon/external event data. This module: - **Arbitrates between two requesters:** - Command interpreter (CI) - for host read/write access - External events processor (EEP) - for runtime axon event processing - **Enforces priority:** Command interpreter has higher priority than external events processor - **Handles BRAM read latency:** Implements 3-cycle pipeline to account for BRAM read delay - **Routes responses** back to appropriate requester with address passthrough In the software/hardware stack: ``` Command Interpreter ──┐ ├──► input_data_handler ──► BRAM (2^15 x 256-bit) External Events │ (Arbiter) │ Processor ────┘ │ │ ┌─────────────────────────────┘ │ Response Router │ ┌────────────┴─────────────┐ ▼ ▼ Command Interpreter External Events Processor (read response) (read response) ``` This module is essential for **efficient BRAM utilization**, allowing both configuration/debug access (via CI) and high-speed runtime processing (via EEP) to share the same memory resource. --- ## Module Architecture ### High-Level Block Diagram ``` input_data_handler ┌─────────────────────────────────────────────────────────────┐ │ │ │ ┌───────────────────────────────┐ │ │ │ Command Interpreter FIFO │ │ │ │ (Input: Local Read) │ │ CI→FIFO ────►│ ci2idp_dout[271:0] │ │ (local) │ [271] = R/W command │ │ empty/rden │ [270:256] = 15-bit address │ │ │ [255:0] = 256-bit data │ │ └───────────┬───────────────────┘ │ │ │ │ │ │ │ ┌───────────▼───────────────────┐ │ │ │ External Events Proc FIFO │ │ │ │ (Input: Local Read) │ │ EEP→FIFO ────►│ eep2idp_dout[14:0] │ │ (local) │ 15-bit address only │ │ empty/rden └───────────┬───────────────────┘ │ │ │ │ │ │ │ │ │ ┌───────▼─────────────────────────────┐ │ │ │ │ Priority Arbiter │ │ │ │ │ - CI has priority over EEP │ │ │ │ │ - Selects address source │ │ │ │ │ - Generates BRAM control signals │ │ │ │ └───────┬─────────────────────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌────────────────────────┐ │ │ │ │ BRAM Interface │ │ BRAM ◄───────┼───┤ addr[14:0] │ │ Interface │ │ din[255:0] (write data)│ │ (2^15 x 256) │ │ dout[255:0] (read data)│ │ │ │ wren (write enable) │ │ │ └────────┬───────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────────────────────────┐ │ │ │ │ 3-Cycle Read Pipeline │ │ │ │ │ (Compensates for BRAM latency) │ │ │ │ │ │ │ │ │ │ IDLE → WAIT_0 → WAIT_1 → │ │ │ │ │ → WAIT_2 → output │ │ │ │ │ │ │ │ │ └──────────┬───────────────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────────────────────────┐ │ │ │ │ Response Router │ │ │ │ │ - Directs read data to │ │ │ │ │ original requester │ │ │ │ │ - Includes address passthrough │ │ │ │ └──────┬─────────┬─────────────────┘ │ │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ idp2ci │ │ idp2eep │ │ CI←FIFO ◄──────┤ FIFO │ │ FIFO │◄───────────EEP←FIFO │ (remote) │ (Output: │ │ (Output: │ (remote) full/wren │ Remote) │ │ Remote) │ │ data └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────┘ ``` --- ## Interface Specification ### Clock and Reset | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `clk` | Input | 1 | 225 MHz system clock | | `resetn` | Input | 1 | Active-low synchronous reset | ### Command Interpreter Interface **Input FIFO (Local - CI to IDP):** | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `ci2idp_empty` | Input | 1 | Input FIFO empty flag | | `ci2idp_dout` | Input | 272 | Input FIFO data output | | `ci2idp_rden` | Output (reg) | 1 | Input FIFO read enable | **Data Format (`ci2idp_dout[271:0]`):** ``` [271] = R/W command (0=read, 1=write) [270:256] = 15-bit BRAM address [255:0] = 256-bit write data ``` **Output FIFO (Remote - IDP to CI):** | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `idp2ci_full` | Input | 1 | Output FIFO full flag | | `idp2ci_din` | Output | 271 | Output FIFO data input | | `idp2ci_wren` | Output (reg) | 1 | Output FIFO write enable | **Data Format (`idp2ci_din[270:0]`):** ``` [270:256] = 15-bit BRAM address (echoed from request) [255:0] = 256-bit read data ``` ### External Events Processor Interface **Input FIFO (Local - EEP to IDP):** | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `eep2idp_empty` | Input | 1 | Input FIFO empty flag | | `eep2idp_dout` | Input | 15 | Input FIFO data output (address only) | | `eep2idp_rden` | Output (reg) | 1 | Input FIFO read enable | **Data Format (`eep2idp_dout[14:0]`):** ``` [14:0] = 15-bit BRAM address (read request only) ``` **Output FIFO (Remote - IDP to EEP):** | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `idp2eep_full` | Input | 1 | Output FIFO full flag | | `idp2eep_din` | Output | 271 | Output FIFO data input | | `idp2eep_wren` | Output (reg) | 1 | Output FIFO write enable | **Data Format (`idp2eep_din[270:0]`):** ``` [270:256] = 15-bit BRAM address (echoed from request) [255:0] = 256-bit read data ``` ### BRAM Interface | Signal | Direction | Width | Description | |--------|-----------|-------|-------------| | `bram_addr` | Output (reg) | 15 | BRAM address (0 to 32,767) | | `bram_din` | Output | 256 | BRAM write data | | `bram_wren` | Output (reg) | 1 | BRAM write enable | | `bram_dout` | Input | 256 | BRAM read data (3-cycle latency) | **BRAM Specifications:** - **Depth:** 32,768 rows (2^15) - **Width:** 256 bits per row - **Total Size:** 1 MB (32,768 × 256 bits = 8,388,608 bits) - **Read Latency:** 3 clock cycles - **Write Latency:** 1 clock cycle (synchronous write) --- ## Detailed Logic Description ### Command Decoder ```verilog localparam CMD_READ = 1'b0; localparam CMD_WRITE = 1'b1; wire command = ci2idp_dout[271]; // Extract R/W bit ``` ### State Machine **States:** ```verilog localparam [2:0] STATE_RESET = 3'd0; localparam [2:0] STATE_IDLE = 3'd1; localparam [2:0] STATE_EEP_WAIT_BRAM_READ_0 = 3'd2; localparam [2:0] STATE_EEP_WAIT_BRAM_READ_1 = 3'd3; localparam [2:0] STATE_EEP_WAIT_BRAM_READ_2 = 3'd4; localparam [2:0] STATE_CI_WAIT_BRAM_READ_0 = 3'd5; localparam [2:0] STATE_CI_WAIT_BRAM_READ_1 = 3'd6; localparam [2:0] STATE_CI_WAIT_BRAM_READ_2 = 3'd7; ``` **State Transition Diagram:** ``` ┌──────────────┐ │ STATE_RESET │ └──────┬───────┘ │ ▼ ┌──────────────┐ ┌───▶│ STATE_IDLE │◄────────────────┬─────────────────┐ │ │ (Arbitrate) │ │ │ │ └──┬───────┬───┘ │ │ │ │ │ │ │ │ !eep │ │ !ci │ │ │ empty│ │ empty │ │ │ │ │ │ │ │ │ └─ CMD_READ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ STATE_CI_WAIT_0 │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ STATE_CI_WAIT_1 │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ STATE_CI_WAIT_2 │ │ │ │ │ │ │ │ │ │!idp2ci_full │ │ │ │ └──────────────┘ │ │ │ │ │ │ CMD_WRITE │ │ └─(immediate pop)──────────────────────────────┘ │ │ │ ▼ │ STATE_EEP_WAIT_0 │ │ │ ▼ │ STATE_EEP_WAIT_1 │ │ │ ▼ │ STATE_EEP_WAIT_2 │ │ │ │!idp2eep_full └───────┘ ``` ### Priority Arbitration Logic **IDLE State Behavior:** ```verilog STATE_IDLE: begin if (~eep2idp_empty) begin // EEP has pending request bram_addr = eep2idp_dout; next_state = STATE_EEP_WAIT_BRAM_READ_0; end else if (~ci2idp_empty) begin // CI has pending request (higher priority) bram_addr = ci2idp_dout[270:256]; // Extract 15-bit address if (command==CMD_READ) next_state = STATE_CI_WAIT_BRAM_READ_0; else begin // CMD_WRITE bram_wren = 1'b1; ci2idp_rden = 1'b1; next_state = STATE_IDLE; // Write completes immediately end end end ``` **Priority Rules:** 1. **CI Write:** Highest priority, completes in 1 cycle (no wait states) 2. **CI Read:** High priority, 3-cycle wait for BRAM latency 3. **EEP Read:** Lower priority, serviced only when CI FIFO empty 4. **No Starvation:** EEP will eventually be serviced due to finite CI request rate ### BRAM Read Pipeline (3-Cycle Latency) **Cycle Breakdown:** ``` Cycle 0: Request arrives in IDLE state - bram_addr = address from FIFO - Transition to WAIT_0 Cycle 1: STATE_WAIT_0 - BRAM internal pipeline stage 1 - bram_addr held stable - Transition to WAIT_1 Cycle 2: STATE_WAIT_1 - BRAM internal pipeline stage 2 - bram_addr held stable - Transition to WAIT_2 Cycle 3: STATE_WAIT_2 - bram_dout now valid - Wait for output FIFO not full - Write to output FIFO (wren pulse) - Pop input FIFO (rden pulse) - Transition to IDLE ``` **EEP Read Example:** ```verilog STATE_EEP_WAIT_BRAM_READ_0: begin bram_addr = eep2idp_dout; // Hold address stable next_state = STATE_EEP_WAIT_BRAM_READ_1; end STATE_EEP_WAIT_BRAM_READ_1: begin bram_addr = eep2idp_dout; next_state = STATE_EEP_WAIT_BRAM_READ_2; end STATE_EEP_WAIT_BRAM_READ_2: begin bram_addr = eep2idp_dout; if (~idp2eep_full) begin idp2eep_wren = 1'b1; // Write read data to output FIFO eep2idp_rden = 1'b1; // Pop request from input FIFO next_state = STATE_IDLE; end // else: stall until output FIFO has space end ``` **CI Read:** Same pattern using `ci2idp_dout[270:256]` for address and `idp2ci` FIFOs. ### Output Data Routing **Assignments:** ```verilog assign idp2eep_din = {bram_addr, bram_dout}; // [270:256]=addr, [255:0]=data assign idp2ci_din = {bram_addr, bram_dout}; assign bram_din = ci2idp_dout[255:0]; // Only CI can write ``` **Address Passthrough:** - Read responses include the original address - Allows requester to correlate response with request - Critical for pipelined operation (though this module doesn't pipeline) --- ## Timing Diagrams ### CI Write Transaction ``` Cycle: 0 1 2 │ │ │ State IDLE │IDLE │ │ │ │ ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│ (WR, Addr=0x1234, Data=0xABCD...) _empty │ │ │ │ │ │ ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁ _rden │ │ │ │ │ │ bram_addr XXXX │0x1234│ │ │ │ bram_wren ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁ │ │ │ bram_din XXXX │0xABCD│ │ │... │ ``` **Notes:** - Single-cycle write operation - No wait states required - Returns to IDLE immediately ### CI Read Transaction ``` Cycle: 0 1 2 3 4 5 │ │ │ │ │ │ State IDLE │WAIT_0│WAIT_1│WAIT_2│IDLE │ │ │ │ │ │ │ ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│ (RD, Addr=0x5678) _empty │ │ │ │ │ │ │ │ │ │ │ │ ci2idp ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁ _rden │ │ │ │ │ │ │ │ │ │ │ │ bram_addr XXXX │0x5678│0x5678│0x5678│0x5678│ │ │ │ │ │ │ bram_dout XXXX │XXXX │XXXX │XXXX │DATA │ │ │ │ │ │ │ idp2ci ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁ _wren │ │ │ │ │ │ │ │ │ │ │ │ idp2ci_din XXXX │XXXX │XXXX │XXXX │{0x5678, │ │ │ │ │ DATA} ``` **Notes:** - 3-cycle wait for BRAM read latency - Address held stable during wait states - Response includes address + data ### Priority Arbitration: EEP Deferred ``` Cycle: 0 1 2 3 4 5 6 7 8 │ │ │ │ │ │ │ │ │ State IDLE │WAIT_0│WAIT_1│WAIT_2│IDLE │WAIT_0│WAIT_1│WAIT_2│ │ │ │ │ │ │ │ │ │ eep2idp ▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│ (pending request) _empty │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ci2idp ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ (higher priority) _empty │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ Serviced - │CI │CI │CI │CI │EEP │EEP │EEP │EEP │ │ │ │ │ │ │ │ │ ``` **Notes:** - Cycle 0: Both FIFOs have requests, CI serviced first - Cycles 1-4: CI read completes (3-cycle wait) - Cycle 5: EEP request now serviced - Demonstrates priority enforcement --- ## Cross-References ### Related Modules | Module | Relationship | Interface | |--------|--------------|-----------| | **command_interpreter.v** | Upstream | Connects to `ci2idp_*` and `idp2ci_*` FIFOs | | **external_events_processor.v** | Upstream | Connects to `eep2idp_*` and `idp2eep_*` FIFOs | | **BRAM (Xilinx IP)** | Downstream | `bram_*` signals control Block RAM | ### BRAM Structure (Parent: pcie2fifos → command_interpreter) **Data Stored in BRAM:** - **Axon/External Event Data** - Each row: 256 bits = 16 × 16-bit masks (one per neuron group) - Row address: Axon ID / 16 **Example Row at Address 0x1000:** ``` Bits [255:240] = Mask for neuron group 15 Bits [239:224] = Mask for neuron group 14 ... Bits [31:16] = Mask for neuron group 1 Bits [15:0] = Mask for neuron group 0 Each 16-bit mask: One bit per neuron group indicating which received axon spike ``` --- ## Key Terms and Definitions | Term | Definition | |------|------------| | **Arbiter** | Logic that decides which requester gains access to shared resource | | **Priority** | CI requests serviced before EEP when both pending | | **Read Latency** | 3 clock cycles from address presentation to valid data | | **Passthrough** | Address echoed back with read data for correlation | | **Local FIFO** | FIFO in same clock domain as module (input side) | | **Remote FIFO** | FIFO potentially in different clock domain (output side) | | **CMD_READ** | Command bit value 0, triggers read transaction | | **CMD_WRITE** | Command bit value 1, triggers write transaction | | **BRAM** | Block RAM - On-chip synchronous memory primitive | | **FIFO Backpressure** | Waiting for output FIFO not full before writing | --- ## Performance Characteristics ### Throughput **Best Case (No Contention):** - **CI Write:** 1 operation per clock cycle = 225 MHz = 225M writes/sec - **CI Read:** 4 cycles per operation (1 IDLE + 3 WAIT) = 56.25M reads/sec - **EEP Read:** 4 cycles per operation = 56.25M reads/sec (when CI idle) **Worst Case (Contention):** - **EEP Read (with CI active):** Indefinitely deferred until CI idle - **CI Read (with output FIFO full):** Stalled in WAIT_2 state **Realistic (Mixed Workload):** - CI accesses: Infrequent (configuration, debug) - EEP accesses: Burst during Phase 1 execution - Typical: EEP dominates, achieving ~50M reads/sec effective rate ### Latency | Operation | Latency (Cycles) | Latency (ns @ 225 MHz) | Notes | |-----------|------------------|------------------------|-------| | CI Write | 1 | 4.4 ns | Immediate, no wait | | CI Read | 4 | 17.8 ns | 1 IDLE + 3 WAIT | | EEP Read | 4 | 17.8 ns | When CI idle | | EEP Read (deferred) | 4 + CI latency | Variable | Must wait for CI completion | ### Stall Conditions **Input Side Stalls:** - None - FIFOs assumed to handle backpressure **Output Side Stalls:** - **WAIT_2 State:** If output FIFO full, module holds until space available - **Impact:** Backpressure propagates to input FIFO (requesters must wait) --- ## Design Considerations ### Why Priority to CI? 1. **Low Frequency:** CI accesses are rare (host-initiated) 2. **Latency Sensitive:** Host expects fast response for debug/config 3. **No Starvation:** EEP can afford to wait a few cycles 4. **Simplicity:** Avoids complex round-robin or fair arbitration ### Why 3-Cycle Wait? - **BRAM Primitive:** Xilinx Block RAM has inherent 2-3 cycle read latency - **Pipeline Registers:** Additional registering for timing closure - **Fixed Latency:** Simplifies state machine design (no variable wait) ### Alternative Designs **Round-Robin Arbitration:** - Pros: Fair access, prevents EEP starvation - Cons: More complex, CI latency increases **Pipelined Operation:** - Pros: Higher throughput (overlapped requests) - Cons: Requires buffering, address tracking, out-of-order handling - Not needed: Current design adequate for workload --- ## Common Issues and Debugging ### Problem: EEP Never Gets Access **Symptoms:** EEP input FIFO fills up, no reads complete **Debug Steps:** 1. Check `ci2idp_empty` - should toggle to 1 occasionally 2. Check state machine - should eventually reach `STATE_EEP_WAIT_0` 3. Verify CI not continuously sending requests **Common Cause:** CI stuck in continuous read/write loop ### Problem: Read Data Incorrect **Symptoms:** Returned data doesn't match expected values **Debug Steps:** 1. Check `bram_addr` during WAIT states - should be stable 2. Verify `bram_dout` on cycle 3 (WAIT_2 state) 3. Confirm write operations completed before read 4. Check address calculation in requester module **Common Cause:** Address mismatch or read-before-write hazard ### Problem: Module Stuck in WAIT_2 **Symptoms:** State machine doesn't return to IDLE **Debug Steps:** 1. Check output FIFO full flag (`idp2ci_full` or `idp2eep_full`) 2. Verify downstream module consuming from output FIFO 3. Check for clock domain crossing issues (if FIFOs are async) **Common Cause:** Output FIFO overflow or downstream stall ### VIO/ILA Probes (Recommended) ```verilog (*mark_debug = "true"*) reg [2:0] curr_state; (*mark_debug = "true"*) wire command = ci2idp_dout[271]; (*mark_debug = "true"*) wire [14:0] ci_addr = ci2idp_dout[270:256]; (*mark_debug = "true"*) wire [14:0] eep_addr = eep2idp_dout; (*mark_debug = "true"*) wire ci_request = ~ci2idp_empty; (*mark_debug = "true"*) wire eep_request = ~eep2idp_empty; (*mark_debug = "true"*) wire [14:0] bram_addr; (*mark_debug = "true"*) wire bram_wren; ``` --- ## Safety and Edge Cases ### Reset Behavior On `resetn` deassertion: - State machine → `STATE_RESET` → `STATE_IDLE` - All output signals → 0 (no spurious FIFO operations) - BRAM address → `15'dX` (don't care) ### Simultaneous Requests **Both FIFOs have data at IDLE state:** - CI serviced first (priority) - EEP serviced after CI completes **Write During Read:** - Write completes in 1 cycle - Subsequent read sees updated value (BRAM write latency = 1 cycle) ### FIFO Full During WAIT_2 - Module stalls in WAIT_2 state - `bram_addr` held stable (safe to stall) - No timeout - waits indefinitely for FIFO space - Assumes downstream will eventually consume --- ## Potential Enhancements 1. **Pipelined Reads:** Allow new request while waiting for previous read - Requires FIFO buffering and address tracking - Could double read throughput 2. **Write Acknowledgment:** Provide write confirmation to CI - Currently fire-and-forget - Useful for verification 3. **Round-Robin or Weighted Arbitration:** Fairer access to EEP - Prevent worst-case starvation scenarios - At cost of CI latency 4. **Variable BRAM Latency:** Support configurable wait cycles - Adapt to different BRAM configurations - Requires parameterization 5. **Performance Counters:** Track utilization and contention - CI access count - EEP access count - Stall cycles - Useful for profiling 6. **Error Detection:** Detect protocol violations - Write with read-pending - Address out of range - Currently no error reporting --- **Document Version:** 1.0 **Last Updated:** December 2025 **Module File:** `input_data_handler.v` **Module Location:** `CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/` **Purpose:** BRAM arbiter for shared axon/external event memory **BRAM Size:** 1 MB (2^15 × 256-bit) **Read Latency:** 3 cycles