input_data_handler.v#

Module Overview#

Purpose and Role in Stack#

The input_data_handler module acts as a BRAM arbiter, managing access to the shared Block RAM (BRAM) that stores axon/external event data. This module:

Arbitrates between two requesters:
- Command interpreter (CI) - for host read/write access
- External events processor (EEP) - for runtime axon event processing
Enforces priority: Command interpreter has higher priority than external events processor
Handles BRAM read latency: Implements 3-cycle pipeline to account for BRAM read delay
Routes responses back to appropriate requester with address passthrough

In the software/hardware stack:

Command Interpreter ──┐
                      ├──► input_data_handler ──► BRAM (2^15 x 256-bit)
External Events       │         (Arbiter)              │
Processor         ────┘                                │
                                                       │
                         ┌─────────────────────────────┘
                         │
                    Response Router
                         │
            ┌────────────┴─────────────┐
            ▼                          ▼
    Command Interpreter      External Events Processor
    (read response)          (read response)

This module is essential for efficient BRAM utilization, allowing both configuration/debug access (via CI) and high-speed runtime processing (via EEP) to share the same memory resource.

Module Architecture#

High-Level Block Diagram#

        input_data_handler
    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │         ┌───────────────────────────────┐                  │
    │         │   Command Interpreter FIFO    │                  │
    │         │   (Input: Local Read)          │                  │
CI→FIFO  ────►│ ci2idp_dout[271:0]            │                  │
(local)       │  [271] = R/W command          │                  │
empty/rden    │  [270:256] = 15-bit address   │                  │
              │  [255:0] = 256-bit data        │                  │
              └───────────┬───────────────────┘                  │
                          │                                      │
    │                     │                                      │
    │         ┌───────────▼───────────────────┐                  │
    │         │   External Events Proc FIFO   │                  │
    │         │   (Input: Local Read)          │                  │
EEP→FIFO  ────►│ eep2idp_dout[14:0]           │                  │
(local)       │  15-bit address only          │                  │
empty/rden    └───────────┬───────────────────┘                  │
              │           │                                      │
              │           │                                      │
    │         │   ┌───────▼─────────────────────────────┐        │
    │         │   │   Priority Arbiter                  │        │
    │         │   │   - CI has priority over EEP        │        │
    │         │   │   - Selects address source          │        │
    │         │   │   - Generates BRAM control signals  │        │
    │         │   └───────┬─────────────────────────────┘        │
    │         │           │                                      │
    │         │           ▼                                      │
    │         │   ┌────────────────────────┐                    │
    │         │   │   BRAM Interface       │                    │
BRAM  ◄───────┼───┤ addr[14:0]             │                    │
Interface     │   │ din[255:0] (write data)│                    │
(2^15 x 256)  │   │ dout[255:0] (read data)│                    │
              │   │ wren (write enable)    │                    │
              │   └────────┬───────────────┘                    │
    │         │            │                                     │
    │         │            ▼                                     │
    │         │   ┌──────────────────────────────────┐          │
    │         │   │   3-Cycle Read Pipeline          │          │
    │         │   │   (Compensates for BRAM latency) │          │
    │         │   │                                  │          │
    │         │   │   IDLE → WAIT_0 → WAIT_1 →      │          │
    │         │   │         → WAIT_2 → output       │          │
    │         │   │                                  │          │
    │         │   └──────────┬───────────────────────┘          │
    │         │              │                                   │
    │         │              ▼                                   │
    │         │   ┌──────────────────────────────────┐          │
    │         │   │   Response Router                │          │
    │         │   │   - Directs read data to         │          │
    │         │   │     original requester           │          │
    │         │   │   - Includes address passthrough │          │
    │         │   └──────┬─────────┬─────────────────┘          │
    │         │          │         │                             │
    │         │          ▼         ▼                             │
    │         │   ┌──────────┐ ┌──────────┐                    │
    │         │   │ idp2ci   │ │ idp2eep  │                    │
CI←FIFO  ◄──────┤ FIFO     │ │ FIFO     │◄───────────EEP←FIFO │
(remote)        │ (Output: │ │ (Output: │                (remote)
full/wren       │  Remote) │ │  Remote) │                        │
data            └──────────┘ └──────────┘                        │
                │                                                 │
                └─────────────────────────────────────────────────┘

Interface Specification#

Clock and Reset#

Signal	Direction	Width	Description
`clk`	Input	1	225 MHz system clock
`resetn`	Input	1	Active-low synchronous reset

Command Interpreter Interface#

Input FIFO (Local - CI to IDP):

Signal	Direction	Width	Description
`ci2idp_empty`	Input	1	Input FIFO empty flag
`ci2idp_dout`	Input	272	Input FIFO data output
`ci2idp_rden`	Output (reg)	1	Input FIFO read enable

Data Format (ci2idp_dout[271:0]):

[271]       = R/W command (0=read, 1=write)
[270:256]   = 15-bit BRAM address
[255:0]     = 256-bit write data

Output FIFO (Remote - IDP to CI):

Signal	Direction	Width	Description
`idp2ci_full`	Input	1	Output FIFO full flag
`idp2ci_din`	Output	271	Output FIFO data input
`idp2ci_wren`	Output (reg)	1	Output FIFO write enable

Data Format (idp2ci_din[270:0]):

[270:256]   = 15-bit BRAM address (echoed from request)
[255:0]     = 256-bit read data

External Events Processor Interface#

Input FIFO (Local - EEP to IDP):

Signal	Direction	Width	Description
`eep2idp_empty`	Input	1	Input FIFO empty flag
`eep2idp_dout`	Input	15	Input FIFO data output (address only)
`eep2idp_rden`	Output (reg)	1	Input FIFO read enable

Data Format (eep2idp_dout[14:0]):

[14:0] = 15-bit BRAM address (read request only)

Output FIFO (Remote - IDP to EEP):

Signal	Direction	Width	Description
`idp2eep_full`	Input	1	Output FIFO full flag
`idp2eep_din`	Output	271	Output FIFO data input
`idp2eep_wren`	Output (reg)	1	Output FIFO write enable

Data Format (idp2eep_din[270:0]):

[270:256]   = 15-bit BRAM address (echoed from request)
[255:0]     = 256-bit read data

BRAM Interface#

Signal	Direction	Width	Description
`bram_addr`	Output (reg)	15	BRAM address (0 to 32,767)
`bram_din`	Output	256	BRAM write data
`bram_wren`	Output (reg)	1	BRAM write enable
`bram_dout`	Input	256	BRAM read data (3-cycle latency)

BRAM Specifications:

Depth: 32,768 rows (2^15)
Width: 256 bits per row
Total Size: 1 MB (32,768 × 256 bits = 8,388,608 bits)
Read Latency: 3 clock cycles
Write Latency: 1 clock cycle (synchronous write)

Detailed Logic Description#

Command Decoder#

localparam CMD_READ  = 1'b0;
localparam CMD_WRITE = 1'b1;

wire command = ci2idp_dout[271];  // Extract R/W bit

State Machine#

States:

localparam [2:0] STATE_RESET                = 3'd0;
localparam [2:0] STATE_IDLE                 = 3'd1;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_0 = 3'd2;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_1 = 3'd3;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_2 = 3'd4;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_0  = 3'd5;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_1  = 3'd6;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_2  = 3'd7;

State Transition Diagram:

                   ┌──────────────┐
                   │ STATE_RESET  │
                   └──────┬───────┘
                          │
                          ▼
                   ┌──────────────┐
              ┌───▶│ STATE_IDLE   │◄────────────────┬─────────────────┐
              │    │ (Arbitrate)  │                 │                 │
              │    └──┬───────┬───┘                 │                 │
              │       │       │                     │                 │
              │  !eep │       │ !ci                 │                 │
              │  empty│       │ empty               │                 │
              │       │       │                     │                 │
              │       │       └─ CMD_READ           │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_0       │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_1       │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_2       │                 │
              │       │              │              │                 │
              │       │              │!idp2ci_full  │                 │
              │       │              └──────────────┘                 │
              │       │                                               │
              │       │ CMD_WRITE                                     │
              │       └─(immediate pop)──────────────────────────────┘
              │       │
              │       ▼
              │    STATE_EEP_WAIT_0
              │       │
              │       ▼
              │    STATE_EEP_WAIT_1
              │       │
              │       ▼
              │    STATE_EEP_WAIT_2
              │       │
              │       │!idp2eep_full
              └───────┘

Priority Arbitration Logic#

IDLE State Behavior:

STATE_IDLE: begin
    if (~eep2idp_empty) begin
        // EEP has pending request
        bram_addr  = eep2idp_dout;
        next_state = STATE_EEP_WAIT_BRAM_READ_0;

    end else if (~ci2idp_empty) begin
        // CI has pending request (higher priority)
        bram_addr = ci2idp_dout[270:256];  // Extract 15-bit address

        if (command==CMD_READ)
            next_state = STATE_CI_WAIT_BRAM_READ_0;
        else begin  // CMD_WRITE
            bram_wren   = 1'b1;
            ci2idp_rden = 1'b1;
            next_state  = STATE_IDLE;  // Write completes immediately
        end
    end
end

Priority Rules:

CI Write: Highest priority, completes in 1 cycle (no wait states)
CI Read: High priority, 3-cycle wait for BRAM latency
EEP Read: Lower priority, serviced only when CI FIFO empty
No Starvation: EEP will eventually be serviced due to finite CI request rate

BRAM Read Pipeline (3-Cycle Latency)#

Cycle Breakdown:

Cycle 0: Request arrives in IDLE state
         - bram_addr = address from FIFO
         - Transition to WAIT_0

Cycle 1: STATE_WAIT_0
         - BRAM internal pipeline stage 1
         - bram_addr held stable
         - Transition to WAIT_1

Cycle 2: STATE_WAIT_1
         - BRAM internal pipeline stage 2
         - bram_addr held stable
         - Transition to WAIT_2

Cycle 3: STATE_WAIT_2
         - bram_dout now valid
         - Wait for output FIFO not full
         - Write to output FIFO (wren pulse)
         - Pop input FIFO (rden pulse)
         - Transition to IDLE

EEP Read Example:

STATE_EEP_WAIT_BRAM_READ_0: begin
    bram_addr  = eep2idp_dout;  // Hold address stable
    next_state = STATE_EEP_WAIT_BRAM_READ_1;
end

STATE_EEP_WAIT_BRAM_READ_1: begin
    bram_addr  = eep2idp_dout;
    next_state = STATE_EEP_WAIT_BRAM_READ_2;
end

STATE_EEP_WAIT_BRAM_READ_2: begin
    bram_addr = eep2idp_dout;
    if (~idp2eep_full) begin
        idp2eep_wren = 1'b1;  // Write read data to output FIFO
        eep2idp_rden = 1'b1;  // Pop request from input FIFO
        next_state = STATE_IDLE;
    end
    // else: stall until output FIFO has space
end

CI Read: Same pattern using ci2idp_dout[270:256] for address and idp2ci FIFOs.

Output Data Routing#

Assignments:

assign idp2eep_din = {bram_addr, bram_dout};  // [270:256]=addr, [255:0]=data
assign idp2ci_din  = {bram_addr, bram_dout};
assign bram_din    = ci2idp_dout[255:0];      // Only CI can write

Address Passthrough:

Read responses include the original address
Allows requester to correlate response with request
Critical for pipelined operation (though this module doesn’t pipeline)

Timing Diagrams#

CI Write Transaction#

Cycle:     0      1      2
           │      │      │
State      IDLE   │IDLE  │
           │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│  (WR, Addr=0x1234, Data=0xABCD...)
_empty     │      │      │
           │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden      │      │      │
           │      │      │
bram_addr  XXXX   │0x1234│
           │      │      │
bram_wren  ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
           │      │      │
bram_din   XXXX   │0xABCD│
           │      │...   │

Notes:

Single-cycle write operation
No wait states required
Returns to IDLE immediately

CI Read Transaction#

Cycle:     0      1      2      3      4      5
           │      │      │      │      │      │
State      IDLE   │WAIT_0│WAIT_1│WAIT_2│IDLE  │
           │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│  (RD, Addr=0x5678)
_empty     │      │      │      │      │      │
           │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden      │      │      │      │      │      │
           │      │      │      │      │      │
bram_addr  XXXX   │0x5678│0x5678│0x5678│0x5678│
           │      │      │      │      │      │
bram_dout  XXXX   │XXXX  │XXXX  │XXXX  │DATA  │
           │      │      │      │      │      │
idp2ci     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_wren      │      │      │      │      │      │
           │      │      │      │      │      │
idp2ci_din XXXX   │XXXX  │XXXX  │XXXX  │{0x5678,
           │      │      │      │      │ DATA}

Notes:

3-cycle wait for BRAM read latency
Address held stable during wait states
Response includes address + data

Priority Arbitration: EEP Deferred#

Cycle:     0      1      2      3      4      5      6      7      8
           │      │      │      │      │      │      │      │      │
State      IDLE   │WAIT_0│WAIT_1│WAIT_2│IDLE  │WAIT_0│WAIT_1│WAIT_2│
           │      │      │      │      │      │      │      │      │
eep2idp    ▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│  (pending request)
_empty     │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  (higher priority)
_empty     │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │
Serviced   -      │CI    │CI    │CI    │CI    │EEP   │EEP   │EEP   │EEP
           │      │      │      │      │      │      │      │      │

Notes:

Cycle 0: Both FIFOs have requests, CI serviced first
Cycles 1-4: CI read completes (3-cycle wait)
Cycle 5: EEP request now serviced
Demonstrates priority enforcement

Cross-References#

Related Modules#

Module	Relationship	Interface
command_interpreter.v	Upstream	Connects to `ci2idp_` and `idp2ci_` FIFOs
external_events_processor.v	Upstream	Connects to `eep2idp_` and `idp2eep_` FIFOs
BRAM (Xilinx IP)	Downstream	`bram_*` signals control Block RAM

BRAM Structure (Parent: pcie2fifos → command_interpreter)#

Data Stored in BRAM:

Axon/External Event Data
Each row: 256 bits = 16 × 16-bit masks (one per neuron group)
Row address: Axon ID / 16

Example Row at Address 0x1000:

Bits [255:240] = Mask for neuron group 15
Bits [239:224] = Mask for neuron group 14
...
Bits [31:16]   = Mask for neuron group 1
Bits [15:0]    = Mask for neuron group 0

Each 16-bit mask: One bit per neuron group indicating which received axon spike

Key Terms and Definitions#

Term	Definition
Arbiter	Logic that decides which requester gains access to shared resource
Priority	CI requests serviced before EEP when both pending
Read Latency	3 clock cycles from address presentation to valid data
Passthrough	Address echoed back with read data for correlation
Local FIFO	FIFO in same clock domain as module (input side)
Remote FIFO	FIFO potentially in different clock domain (output side)
CMD_READ	Command bit value 0, triggers read transaction
CMD_WRITE	Command bit value 1, triggers write transaction
BRAM	Block RAM - On-chip synchronous memory primitive
FIFO Backpressure	Waiting for output FIFO not full before writing

Performance Characteristics#

Throughput#

Best Case (No Contention):

CI Write: 1 operation per clock cycle = 225 MHz = 225M writes/sec
CI Read: 4 cycles per operation (1 IDLE + 3 WAIT) = 56.25M reads/sec
EEP Read: 4 cycles per operation = 56.25M reads/sec (when CI idle)

Worst Case (Contention):

EEP Read (with CI active): Indefinitely deferred until CI idle
CI Read (with output FIFO full): Stalled in WAIT_2 state

Realistic (Mixed Workload):

CI accesses: Infrequent (configuration, debug)
EEP accesses: Burst during Phase 1 execution
Typical: EEP dominates, achieving ~50M reads/sec effective rate

Latency#

Operation	Latency (Cycles)	Latency (ns @ 225 MHz)	Notes
CI Write	1	4.4 ns	Immediate, no wait
CI Read	4	17.8 ns	1 IDLE + 3 WAIT
EEP Read	4	17.8 ns	When CI idle
EEP Read (deferred)	4 + CI latency	Variable	Must wait for CI completion

Stall Conditions#

Input Side Stalls:

None - FIFOs assumed to handle backpressure

Output Side Stalls:

WAIT_2 State: If output FIFO full, module holds until space available
Impact: Backpressure propagates to input FIFO (requesters must wait)

Design Considerations#

Why Priority to CI?#

Low Frequency: CI accesses are rare (host-initiated)
Latency Sensitive: Host expects fast response for debug/config
No Starvation: EEP can afford to wait a few cycles
Simplicity: Avoids complex round-robin or fair arbitration

Why 3-Cycle Wait?#

BRAM Primitive: Xilinx Block RAM has inherent 2-3 cycle read latency
Pipeline Registers: Additional registering for timing closure
Fixed Latency: Simplifies state machine design (no variable wait)

Alternative Designs#

Round-Robin Arbitration:

Pros: Fair access, prevents EEP starvation
Cons: More complex, CI latency increases

Pipelined Operation:

Pros: Higher throughput (overlapped requests)
Cons: Requires buffering, address tracking, out-of-order handling
Not needed: Current design adequate for workload

Common Issues and Debugging#

Problem: EEP Never Gets Access#

Symptoms: EEP input FIFO fills up, no reads complete

Debug Steps:

Check ci2idp_empty - should toggle to 1 occasionally
Check state machine - should eventually reach STATE_EEP_WAIT_0
Verify CI not continuously sending requests

Common Cause: CI stuck in continuous read/write loop

Problem: Read Data Incorrect#

Symptoms: Returned data doesn’t match expected values

Debug Steps:

Check bram_addr during WAIT states - should be stable
Verify bram_dout on cycle 3 (WAIT_2 state)
Confirm write operations completed before read
Check address calculation in requester module

Common Cause: Address mismatch or read-before-write hazard

Problem: Module Stuck in WAIT_2#

Symptoms: State machine doesn’t return to IDLE

Debug Steps:

Check output FIFO full flag (idp2ci_full or idp2eep_full)
Verify downstream module consuming from output FIFO
Check for clock domain crossing issues (if FIFOs are async)

Common Cause: Output FIFO overflow or downstream stall

VIO/ILA Probes (Recommended)#

(*mark_debug = "true"*) reg [2:0] curr_state;
(*mark_debug = "true"*) wire command = ci2idp_dout[271];
(*mark_debug = "true"*) wire [14:0] ci_addr = ci2idp_dout[270:256];
(*mark_debug = "true"*) wire [14:0] eep_addr = eep2idp_dout;
(*mark_debug = "true"*) wire ci_request = ~ci2idp_empty;
(*mark_debug = "true"*) wire eep_request = ~eep2idp_empty;
(*mark_debug = "true"*) wire [14:0] bram_addr;
(*mark_debug = "true"*) wire bram_wren;

Safety and Edge Cases#

Reset Behavior#

On resetn deassertion:

State machine → STATE_RESET → STATE_IDLE
All output signals → 0 (no spurious FIFO operations)
BRAM address → 15'dX (don’t care)

Simultaneous Requests#

Both FIFOs have data at IDLE state:

CI serviced first (priority)
EEP serviced after CI completes

Write During Read:

Write completes in 1 cycle
Subsequent read sees updated value (BRAM write latency = 1 cycle)

FIFO Full During WAIT_2#

Module stalls in WAIT_2 state
bram_addr held stable (safe to stall)
No timeout - waits indefinitely for FIFO space
Assumes downstream will eventually consume

Potential Enhancements#

Pipelined Reads: Allow new request while waiting for previous read
- Requires FIFO buffering and address tracking
- Could double read throughput
Write Acknowledgment: Provide write confirmation to CI
- Currently fire-and-forget
- Useful for verification
Round-Robin or Weighted Arbitration: Fairer access to EEP
- Prevent worst-case starvation scenarios
- At cost of CI latency
Variable BRAM Latency: Support configurable wait cycles
- Adapt to different BRAM configurations
- Requires parameterization
Performance Counters: Track utilization and contention
- CI access count
- EEP access count
- Stall cycles
- Useful for profiling
Error Detection: Detect protocol violations
- Write with read-pending
- Address out of range
- Currently no error reporting

Document Version: 1.0 Last Updated: December 2025 Module File: input_data_handler.v Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/ Purpose: BRAM arbiter for shared axon/external event memory BRAM Size: 1 MB (2^15 × 256-bit) Read Latency: 3 cycles

input_data_handler.v#

Module Overview#

Purpose and Role in Stack#

Module Architecture#

High-Level Block Diagram#

Interface Specification#

Clock and Reset#

Command Interpreter Interface#

External Events Processor Interface#

BRAM Interface#

Detailed Logic Description#

Command Decoder#

State Machine#

Priority Arbitration Logic#

BRAM Read Pipeline (3-Cycle Latency)#

Output Data Routing#

Timing Diagrams#

CI Write Transaction#

CI Read Transaction#

Priority Arbitration: EEP Deferred#

Cross-References#

Related Modules#

BRAM Structure (Parent: pcie2fifos → command_interpreter)#

Key Terms and Definitions#

Performance Characteristics#

Throughput#

Latency#

Stall Conditions#

Design Considerations#

Why Priority to CI?#

Why 3-Cycle Wait?#

Alternative Designs#

Common Issues and Debugging#

Problem: EEP Never Gets Access#

Problem: Read Data Incorrect#

Problem: Module Stuck in WAIT_2#

VIO/ILA Probes (Recommended)#

Safety and Edge Cases#

Reset Behavior#

Simultaneous Requests#

FIFO Full During WAIT_2#

Potential Enhancements#

This Page