input_data_handler.v#

Module Overview#

Purpose and Role in Stack#

The input_data_handler module acts as a BRAM arbiter, managing access to the shared Block RAM (BRAM) that stores axon/external event data. This module:

  • Arbitrates between two requesters:

    • Command interpreter (CI) - for host read/write access

    • External events processor (EEP) - for runtime axon event processing

  • Enforces priority: Command interpreter has higher priority than external events processor

  • Handles BRAM read latency: Implements 3-cycle pipeline to account for BRAM read delay

  • Routes responses back to appropriate requester with address passthrough

In the software/hardware stack:

Command Interpreter ──┐
                      ├──► input_data_handler ──► BRAM (2^15 x 256-bit)
External Events       │         (Arbiter)              │
Processor         ────┘                                │
                                                       │
                         ┌─────────────────────────────┘
                         │
                    Response Router
                         │
            ┌────────────┴─────────────┐
            ▼                          ▼
    Command Interpreter      External Events Processor
    (read response)          (read response)

This module is essential for efficient BRAM utilization, allowing both configuration/debug access (via CI) and high-speed runtime processing (via EEP) to share the same memory resource.


Module Architecture#

High-Level Block Diagram#

        input_data_handler
    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │         ┌───────────────────────────────┐                  │
    │         │   Command Interpreter FIFO    │                  │
    │         │   (Input: Local Read)          │                  │
CI→FIFO  ────►│ ci2idp_dout[271:0]            │                  │
(local)       │  [271] = R/W command          │                  │
empty/rden    │  [270:256] = 15-bit address   │                  │
              │  [255:0] = 256-bit data        │                  │
              └───────────┬───────────────────┘                  │
                          │                                      │
    │                     │                                      │
    │         ┌───────────▼───────────────────┐                  │
    │         │   External Events Proc FIFO   │                  │
    │         │   (Input: Local Read)          │                  │
EEP→FIFO  ────►│ eep2idp_dout[14:0]           │                  │
(local)       │  15-bit address only          │                  │
empty/rden    └───────────┬───────────────────┘                  │
              │           │                                      │
              │           │                                      │
    │         │   ┌───────▼─────────────────────────────┐        │
    │         │   │   Priority Arbiter                  │        │
    │         │   │   - CI has priority over EEP        │        │
    │         │   │   - Selects address source          │        │
    │         │   │   - Generates BRAM control signals  │        │
    │         │   └───────┬─────────────────────────────┘        │
    │         │           │                                      │
    │         │           ▼                                      │
    │         │   ┌────────────────────────┐                    │
    │         │   │   BRAM Interface       │                    │
BRAM  ◄───────┼───┤ addr[14:0]             │                    │
Interface     │   │ din[255:0] (write data)│                    │
(2^15 x 256)  │   │ dout[255:0] (read data)│                    │
              │   │ wren (write enable)    │                    │
              │   └────────┬───────────────┘                    │
    │         │            │                                     │
    │         │            ▼                                     │
    │         │   ┌──────────────────────────────────┐          │
    │         │   │   3-Cycle Read Pipeline          │          │
    │         │   │   (Compensates for BRAM latency) │          │
    │         │   │                                  │          │
    │         │   │   IDLE → WAIT_0 → WAIT_1 →      │          │
    │         │   │         → WAIT_2 → output       │          │
    │         │   │                                  │          │
    │         │   └──────────┬───────────────────────┘          │
    │         │              │                                   │
    │         │              ▼                                   │
    │         │   ┌──────────────────────────────────┐          │
    │         │   │   Response Router                │          │
    │         │   │   - Directs read data to         │          │
    │         │   │     original requester           │          │
    │         │   │   - Includes address passthrough │          │
    │         │   └──────┬─────────┬─────────────────┘          │
    │         │          │         │                             │
    │         │          ▼         ▼                             │
    │         │   ┌──────────┐ ┌──────────┐                    │
    │         │   │ idp2ci   │ │ idp2eep  │                    │
CI←FIFO  ◄──────┤ FIFO     │ │ FIFO     │◄───────────EEP←FIFO │
(remote)        │ (Output: │ │ (Output: │                (remote)
full/wren       │  Remote) │ │  Remote) │                        │
data            └──────────┘ └──────────┘                        │
                │                                                 │
                └─────────────────────────────────────────────────┘

Interface Specification#

Clock and Reset#

Signal

Direction

Width

Description

clk

Input

1

225 MHz system clock

resetn

Input

1

Active-low synchronous reset

Command Interpreter Interface#

Input FIFO (Local - CI to IDP):

Signal

Direction

Width

Description

ci2idp_empty

Input

1

Input FIFO empty flag

ci2idp_dout

Input

272

Input FIFO data output

ci2idp_rden

Output (reg)

1

Input FIFO read enable

Data Format (ci2idp_dout[271:0]):

[271]       = R/W command (0=read, 1=write)
[270:256]   = 15-bit BRAM address
[255:0]     = 256-bit write data

Output FIFO (Remote - IDP to CI):

Signal

Direction

Width

Description

idp2ci_full

Input

1

Output FIFO full flag

idp2ci_din

Output

271

Output FIFO data input

idp2ci_wren

Output (reg)

1

Output FIFO write enable

Data Format (idp2ci_din[270:0]):

[270:256]   = 15-bit BRAM address (echoed from request)
[255:0]     = 256-bit read data

External Events Processor Interface#

Input FIFO (Local - EEP to IDP):

Signal

Direction

Width

Description

eep2idp_empty

Input

1

Input FIFO empty flag

eep2idp_dout

Input

15

Input FIFO data output (address only)

eep2idp_rden

Output (reg)

1

Input FIFO read enable

Data Format (eep2idp_dout[14:0]):

[14:0] = 15-bit BRAM address (read request only)

Output FIFO (Remote - IDP to EEP):

Signal

Direction

Width

Description

idp2eep_full

Input

1

Output FIFO full flag

idp2eep_din

Output

271

Output FIFO data input

idp2eep_wren

Output (reg)

1

Output FIFO write enable

Data Format (idp2eep_din[270:0]):

[270:256]   = 15-bit BRAM address (echoed from request)
[255:0]     = 256-bit read data

BRAM Interface#

Signal

Direction

Width

Description

bram_addr

Output (reg)

15

BRAM address (0 to 32,767)

bram_din

Output

256

BRAM write data

bram_wren

Output (reg)

1

BRAM write enable

bram_dout

Input

256

BRAM read data (3-cycle latency)

BRAM Specifications:

  • Depth: 32,768 rows (2^15)

  • Width: 256 bits per row

  • Total Size: 1 MB (32,768 × 256 bits = 8,388,608 bits)

  • Read Latency: 3 clock cycles

  • Write Latency: 1 clock cycle (synchronous write)


Detailed Logic Description#

Command Decoder#

localparam CMD_READ  = 1'b0;
localparam CMD_WRITE = 1'b1;

wire command = ci2idp_dout[271];  // Extract R/W bit

State Machine#

States:

localparam [2:0] STATE_RESET                = 3'd0;
localparam [2:0] STATE_IDLE                 = 3'd1;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_0 = 3'd2;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_1 = 3'd3;
localparam [2:0] STATE_EEP_WAIT_BRAM_READ_2 = 3'd4;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_0  = 3'd5;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_1  = 3'd6;
localparam [2:0] STATE_CI_WAIT_BRAM_READ_2  = 3'd7;

State Transition Diagram:

                   ┌──────────────┐
                   │ STATE_RESET  │
                   └──────┬───────┘
                          │
                          ▼
                   ┌──────────────┐
              ┌───▶│ STATE_IDLE   │◄────────────────┬─────────────────┐
              │    │ (Arbitrate)  │                 │                 │
              │    └──┬───────┬───┘                 │                 │
              │       │       │                     │                 │
              │  !eep │       │ !ci                 │                 │
              │  empty│       │ empty               │                 │
              │       │       │                     │                 │
              │       │       └─ CMD_READ           │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_0       │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_1       │                 │
              │       │              │              │                 │
              │       │              ▼              │                 │
              │       │       STATE_CI_WAIT_2       │                 │
              │       │              │              │                 │
              │       │              │!idp2ci_full  │                 │
              │       │              └──────────────┘                 │
              │       │                                               │
              │       │ CMD_WRITE                                     │
              │       └─(immediate pop)──────────────────────────────┘
              │       │
              │       ▼
              │    STATE_EEP_WAIT_0
              │       │
              │       ▼
              │    STATE_EEP_WAIT_1
              │       │
              │       ▼
              │    STATE_EEP_WAIT_2
              │       │
              │       │!idp2eep_full
              └───────┘

Priority Arbitration Logic#

IDLE State Behavior:

STATE_IDLE: begin
    if (~eep2idp_empty) begin
        // EEP has pending request
        bram_addr  = eep2idp_dout;
        next_state = STATE_EEP_WAIT_BRAM_READ_0;

    end else if (~ci2idp_empty) begin
        // CI has pending request (higher priority)
        bram_addr = ci2idp_dout[270:256];  // Extract 15-bit address

        if (command==CMD_READ)
            next_state = STATE_CI_WAIT_BRAM_READ_0;
        else begin  // CMD_WRITE
            bram_wren   = 1'b1;
            ci2idp_rden = 1'b1;
            next_state  = STATE_IDLE;  // Write completes immediately
        end
    end
end

Priority Rules:

  1. CI Write: Highest priority, completes in 1 cycle (no wait states)

  2. CI Read: High priority, 3-cycle wait for BRAM latency

  3. EEP Read: Lower priority, serviced only when CI FIFO empty

  4. No Starvation: EEP will eventually be serviced due to finite CI request rate

BRAM Read Pipeline (3-Cycle Latency)#

Cycle Breakdown:

Cycle 0: Request arrives in IDLE state
         - bram_addr = address from FIFO
         - Transition to WAIT_0

Cycle 1: STATE_WAIT_0
         - BRAM internal pipeline stage 1
         - bram_addr held stable
         - Transition to WAIT_1

Cycle 2: STATE_WAIT_1
         - BRAM internal pipeline stage 2
         - bram_addr held stable
         - Transition to WAIT_2

Cycle 3: STATE_WAIT_2
         - bram_dout now valid
         - Wait for output FIFO not full
         - Write to output FIFO (wren pulse)
         - Pop input FIFO (rden pulse)
         - Transition to IDLE

EEP Read Example:

STATE_EEP_WAIT_BRAM_READ_0: begin
    bram_addr  = eep2idp_dout;  // Hold address stable
    next_state = STATE_EEP_WAIT_BRAM_READ_1;
end

STATE_EEP_WAIT_BRAM_READ_1: begin
    bram_addr  = eep2idp_dout;
    next_state = STATE_EEP_WAIT_BRAM_READ_2;
end

STATE_EEP_WAIT_BRAM_READ_2: begin
    bram_addr = eep2idp_dout;
    if (~idp2eep_full) begin
        idp2eep_wren = 1'b1;  // Write read data to output FIFO
        eep2idp_rden = 1'b1;  // Pop request from input FIFO
        next_state = STATE_IDLE;
    end
    // else: stall until output FIFO has space
end

CI Read: Same pattern using ci2idp_dout[270:256] for address and idp2ci FIFOs.

Output Data Routing#

Assignments:

assign idp2eep_din = {bram_addr, bram_dout};  // [270:256]=addr, [255:0]=data
assign idp2ci_din  = {bram_addr, bram_dout};
assign bram_din    = ci2idp_dout[255:0];      // Only CI can write

Address Passthrough:

  • Read responses include the original address

  • Allows requester to correlate response with request

  • Critical for pipelined operation (though this module doesn’t pipeline)


Timing Diagrams#

CI Write Transaction#

Cycle:     0      1      2
           │      │      │
State      IDLE   │IDLE  │
           │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│  (WR, Addr=0x1234, Data=0xABCD...)
_empty     │      │      │
           │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden      │      │      │
           │      │      │
bram_addr  XXXX   │0x1234│
           │      │      │
bram_wren  ▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
           │      │      │
bram_din   XXXX   │0xABCD│
           │      │...   │

Notes:

  • Single-cycle write operation

  • No wait states required

  • Returns to IDLE immediately

CI Read Transaction#

Cycle:     0      1      2      3      4      5
           │      │      │      │      │      │
State      IDLE   │WAIT_0│WAIT_1│WAIT_2│IDLE  │
           │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│  (RD, Addr=0x5678)
_empty     │      │      │      │      │      │
           │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_rden      │      │      │      │      │      │
           │      │      │      │      │      │
bram_addr  XXXX   │0x5678│0x5678│0x5678│0x5678│
           │      │      │      │      │      │
bram_dout  XXXX   │XXXX  │XXXX  │XXXX  │DATA  │
           │      │      │      │      │      │
idp2ci     ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│▔▔▔▔▔▔▁▁
_wren      │      │      │      │      │      │
           │      │      │      │      │      │
idp2ci_din XXXX   │XXXX  │XXXX  │XXXX  │{0x5678,
           │      │      │      │      │ DATA}

Notes:

  • 3-cycle wait for BRAM read latency

  • Address held stable during wait states

  • Response includes address + data

Priority Arbitration: EEP Deferred#

Cycle:     0      1      2      3      4      5      6      7      8
           │      │      │      │      │      │      │      │      │
State      IDLE   │WAIT_0│WAIT_1│WAIT_2│IDLE  │WAIT_0│WAIT_1│WAIT_2│
           │      │      │      │      │      │      │      │      │
eep2idp    ▔▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│  (pending request)
_empty     │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │
ci2idp     ▁▁▁▁▁▁▁│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔│▔▔▔▔▔▔▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  (higher priority)
_empty     │      │      │      │      │      │      │      │      │
           │      │      │      │      │      │      │      │      │
Serviced   -      │CI    │CI    │CI    │CI    │EEP   │EEP   │EEP   │EEP
           │      │      │      │      │      │      │      │      │

Notes:

  • Cycle 0: Both FIFOs have requests, CI serviced first

  • Cycles 1-4: CI read completes (3-cycle wait)

  • Cycle 5: EEP request now serviced

  • Demonstrates priority enforcement


Cross-References#

BRAM Structure (Parent: pcie2fifos → command_interpreter)#

Data Stored in BRAM:

  • Axon/External Event Data

  • Each row: 256 bits = 16 × 16-bit masks (one per neuron group)

  • Row address: Axon ID / 16

Example Row at Address 0x1000:

Bits [255:240] = Mask for neuron group 15
Bits [239:224] = Mask for neuron group 14
...
Bits [31:16]   = Mask for neuron group 1
Bits [15:0]    = Mask for neuron group 0

Each 16-bit mask: One bit per neuron group indicating which received axon spike

Key Terms and Definitions#

Term

Definition

Arbiter

Logic that decides which requester gains access to shared resource

Priority

CI requests serviced before EEP when both pending

Read Latency

3 clock cycles from address presentation to valid data

Passthrough

Address echoed back with read data for correlation

Local FIFO

FIFO in same clock domain as module (input side)

Remote FIFO

FIFO potentially in different clock domain (output side)

CMD_READ

Command bit value 0, triggers read transaction

CMD_WRITE

Command bit value 1, triggers write transaction

BRAM

Block RAM - On-chip synchronous memory primitive

FIFO Backpressure

Waiting for output FIFO not full before writing


Performance Characteristics#

Throughput#

Best Case (No Contention):

  • CI Write: 1 operation per clock cycle = 225 MHz = 225M writes/sec

  • CI Read: 4 cycles per operation (1 IDLE + 3 WAIT) = 56.25M reads/sec

  • EEP Read: 4 cycles per operation = 56.25M reads/sec (when CI idle)

Worst Case (Contention):

  • EEP Read (with CI active): Indefinitely deferred until CI idle

  • CI Read (with output FIFO full): Stalled in WAIT_2 state

Realistic (Mixed Workload):

  • CI accesses: Infrequent (configuration, debug)

  • EEP accesses: Burst during Phase 1 execution

  • Typical: EEP dominates, achieving ~50M reads/sec effective rate

Latency#

Operation

Latency (Cycles)

Latency (ns @ 225 MHz)

Notes

CI Write

1

4.4 ns

Immediate, no wait

CI Read

4

17.8 ns

1 IDLE + 3 WAIT

EEP Read

4

17.8 ns

When CI idle

EEP Read (deferred)

4 + CI latency

Variable

Must wait for CI completion

Stall Conditions#

Input Side Stalls:

  • None - FIFOs assumed to handle backpressure

Output Side Stalls:

  • WAIT_2 State: If output FIFO full, module holds until space available

  • Impact: Backpressure propagates to input FIFO (requesters must wait)


Design Considerations#

Why Priority to CI?#

  1. Low Frequency: CI accesses are rare (host-initiated)

  2. Latency Sensitive: Host expects fast response for debug/config

  3. No Starvation: EEP can afford to wait a few cycles

  4. Simplicity: Avoids complex round-robin or fair arbitration

Why 3-Cycle Wait?#

  • BRAM Primitive: Xilinx Block RAM has inherent 2-3 cycle read latency

  • Pipeline Registers: Additional registering for timing closure

  • Fixed Latency: Simplifies state machine design (no variable wait)

Alternative Designs#

Round-Robin Arbitration:

  • Pros: Fair access, prevents EEP starvation

  • Cons: More complex, CI latency increases

Pipelined Operation:

  • Pros: Higher throughput (overlapped requests)

  • Cons: Requires buffering, address tracking, out-of-order handling

  • Not needed: Current design adequate for workload


Common Issues and Debugging#

Problem: EEP Never Gets Access#

Symptoms: EEP input FIFO fills up, no reads complete

Debug Steps:

  1. Check ci2idp_empty - should toggle to 1 occasionally

  2. Check state machine - should eventually reach STATE_EEP_WAIT_0

  3. Verify CI not continuously sending requests

Common Cause: CI stuck in continuous read/write loop

Problem: Read Data Incorrect#

Symptoms: Returned data doesn’t match expected values

Debug Steps:

  1. Check bram_addr during WAIT states - should be stable

  2. Verify bram_dout on cycle 3 (WAIT_2 state)

  3. Confirm write operations completed before read

  4. Check address calculation in requester module

Common Cause: Address mismatch or read-before-write hazard

Problem: Module Stuck in WAIT_2#

Symptoms: State machine doesn’t return to IDLE

Debug Steps:

  1. Check output FIFO full flag (idp2ci_full or idp2eep_full)

  2. Verify downstream module consuming from output FIFO

  3. Check for clock domain crossing issues (if FIFOs are async)

Common Cause: Output FIFO overflow or downstream stall


Safety and Edge Cases#

Reset Behavior#

On resetn deassertion:

  • State machine → STATE_RESETSTATE_IDLE

  • All output signals → 0 (no spurious FIFO operations)

  • BRAM address → 15'dX (don’t care)

Simultaneous Requests#

Both FIFOs have data at IDLE state:

  • CI serviced first (priority)

  • EEP serviced after CI completes

Write During Read:

  • Write completes in 1 cycle

  • Subsequent read sees updated value (BRAM write latency = 1 cycle)

FIFO Full During WAIT_2#

  • Module stalls in WAIT_2 state

  • bram_addr held stable (safe to stall)

  • No timeout - waits indefinitely for FIFO space

  • Assumes downstream will eventually consume


Potential Enhancements#

  1. Pipelined Reads: Allow new request while waiting for previous read

    • Requires FIFO buffering and address tracking

    • Could double read throughput

  2. Write Acknowledgment: Provide write confirmation to CI

    • Currently fire-and-forget

    • Useful for verification

  3. Round-Robin or Weighted Arbitration: Fairer access to EEP

    • Prevent worst-case starvation scenarios

    • At cost of CI latency

  4. Variable BRAM Latency: Support configurable wait cycles

    • Adapt to different BRAM configurations

    • Requires parameterization

  5. Performance Counters: Track utilization and contention

    • CI access count

    • EEP access count

    • Stall cycles

    • Useful for profiling

  6. Error Detection: Detect protocol violations

    • Write with read-pending

    • Address out of range

    • Currently no error reporting


Document Version: 1.0 Last Updated: December 2025 Module File: input_data_handler.v Module Location: CRI_proj/cri_fpga/code/new/hyddenn2/vivado/single_core.srcs/sources_1/new/ Purpose: BRAM arbiter for shared axon/external event memory BRAM Size: 1 MB (2^15 × 256-bit) Read Latency: 3 cycles