# Spike FIFO Controller Module

## Overview

The **Spike FIFO Controller** is a simple yet critical arbitration component that aggregates output spikes from 8 parallel Spike FIFOs into a single unified stream. These spikes represent post-synaptic neuron activations generated during Phase 2 (synaptic weight processing) and are fed to the Internal Events Processor for the next time step.

### Role in the Software/Hardware Stack

```
                    Phase 2: Synaptic Processing
                       (Weight Application)
                              |
    ┌─────────────────────────┼─────────────────────────┐
    |                         v                         |
    |              [HBM Processor]                      |
    |                         |                         |
    |          Fetch synaptic weights from HBM          |
    |                         |                         |
    |          Apply weights to post-synaptic neurons   |
    |                         |                         |
    |                  spk0_wren ... spk7_wren          |
    |                         |                         |
    |         ┌───────────────┼───────────────┐         |
    |         |               v               |         |
    |         | ┌─────┐ ┌─────┐ ... ┌─────┐  |         |
    |         | │spk0 │ │spk1 │     │spk7 │  |         |
    |         | │FIFO │ │FIFO │     │FIFO │  |         |
    |         | │17b  │ │17b  │     │17b  │  |         |
    |         | └──┬──┘ └──┬──┘     └──┬──┘  |         |
    |         │    |       |            |     │         |
    |         │    v       v            v     │         |
    |         │   ┌────────────────────────┐  │         |
    |         │   │  Round-Robin Arbiter   │  │         |
    |         │   │  (3-bit counter 0-7)   │  │         |
    |         │   └──────────┬─────────────┘  │         |
    |         │              |                 │         |
    |         │              v                 │         |
    |         │      ┌────────────────┐        │         |
    |         │      │  spk2ciFIFO    │        │         |
    |         │      │   (17-bit)     │        │         |
    |         │      └───────┬────────┘        │         |
    |         └──────────────┼─────────────────┘         |
    |                        |                           |
    |                        v                           |
    |           [Internal Events Processor]              |
    |                        |                           |
    |              Next time step (Phase 1b)             |
    └───────────────────────────────────────────────────┘
```

**Function**:
- **Aggregate Spikes**: Collect spike events from 8 parallel FIFOs
- **Fair Arbitration**: Round-robin scheduler ensures all spike sources get equal service
- **Unified Output**: Present consolidated spike stream to downstream processor

**Key Innovation**: By using round-robin arbitration, the module ensures fairness - no single spike source can monopolize the output, preventing starvation even under heavy load.

---

## Module Architecture

```
                     8 Spike FIFOs
                    (Parallel Inputs)
                          |
    ┌─────────────────────┼─────────────────────┐
    |                     v                     |
    | ┌─────┐ ┌─────┐ ┌─────┐ ... ┌─────┐     |
    | │spk0 │ │spk1 │ │spk2 │     │spk7 │     |
    | │empty│ │empty│ │empty│     │empty│     |
    | │dout │ │dout │ │dout │     │dout │     |
    | │[16:0││ │[16:0││ │[16:0│    │[16:0│    |
    | └──┬──┘ └──┬──┘ └──┬──┘     └──┬──┘     |
    |    |       |       |            |        |
    |    └───────┴───────┴────────────┘        |
    |                    |                     |
    |                    v                     |
    |         ┌──────────────────────┐         |
    |         │  Round-Robin Counter │         |
    |         │    addr[2:0]         │         |
    |         │    0 → 1 → ... → 7   │         |
    |         └──────────┬───────────┘         |
    |                    |                     |
    |                    v                     |
    |         ┌──────────────────────┐         |
    |         │   8:1 Multiplexer    │         |
    |         │   (Select based on   │         |
    |         │    addr & !empty &   │         |
    |         │    !spk2ciFIFO_full) │         |
    |         └──────────┬───────────┘         |
    |                    |                     |
    |    ┌───────────────┼───────────────┐     |
    |    |               v               |     |
    | spk*_rden   spk2ciFIFO_din[16:0]  |     |
    |    |         spk2ciFIFO_wren       |     |
    |    v               |               v     |
    |   FIFO            FIFO           FIFO    |
    |  Advance         Write           Enable  |
    └───────────────────┼─────────────────────┘
                        |
                        v
              ┌──────────────────┐
              │  spk2ciFIFO      │
              │  (Output FIFO)   │
              │  To Internal     │
              │  Events Proc.    │
              └──────────────────┘
```

### Data Flow

**Phase 2: Spike Generation (Concurrent with Phase 1)**
```
1. HBM Processor applies synaptic weights
2. For each weight applied:
   - Calculate post-synaptic neuron potential update
   - If neuron crosses threshold → generate spike
   - Write spike to appropriate spike FIFO (spk0-7)
3. Spike data format: [16:0] = {1'b valid, 16'b neuron_address}
4. Spikes accumulate in parallel FIFOs
```

**Phase 3: Spike Drain (After Phase 2 Complete)**
```
1. Round-robin arbiter cycles addr 0→1→...→7→0
2. Every cycle:
   - Check if spk[addr]_empty==0 and spk2ciFIFO_full==0
   - If true: read from spk[addr], write to spk2ciFIFO
   - If false: skip (no-op), continue to next address
3. Continue until all spike FIFOs empty
4. Spikes now ready for Internal Events Processor (next time step)
```

---

## Interface Specification

### Clock and Reset
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `clk` | Input | 1 | System clock (225 MHz typical) |
| `resetn` | Input | 1 | Active-low asynchronous reset |

### Spike FIFO Interfaces (8 instances: spk0-spk7)

Each spike FIFO has identical read-only interface (example for spk0):

| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `spk0_empty` | Input | 1 | FIFO empty flag |
| `spk0_dout` | Input | 17 | Spike data (likely: valid + neuron address) |
| `spk0_rden` | Output | 1 | Read enable (from arbiter) |

**Spike FIFOs**: spk1, spk2, ..., spk7 (identical interfaces)

**Note**: The module has commented-out ports for spk8-spk15 (lines 40-72, 104-111, 172-229, 239-246), suggesting the design originally supported 16 FIFOs but was reduced to 8 for the current single-core implementation.

### Output Interface (Aggregated Spike FIFO)
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `spk2ciFIFO_full` | Input | 1 | Output FIFO full flag (backpressure) |
| `spk2ciFIFO_din` | Output | 17 | Spike data to output FIFO |
| `spk2ciFIFO_wren` | Output | 1 | Write enable (from arbiter) |

**Name Interpretation**: "spk2ciFIFO" likely means "Spike to Command Interpreter FIFO" or "Spike to Core Internal FIFO", though it actually feeds the Internal Events Processor.

---

## Detailed Logic Description

### Round-Robin Arbiter

A 3-bit counter cycles through FIFOs 0-7, servicing one per cycle:

```verilog
reg [2:0] addr;  // 3 bits for 8 FIFOs (0-7)

always @(posedge clk) begin
    if (!resetn)
        addr <= 3'd0;
    else
        addr <= addr + 1'b1;  // Wraps 7→0 automatically
end
```

**Arbitration Cycle**:
```
Cycle 0:  addr=0  → Check spk0
Cycle 1:  addr=1  → Check spk1
Cycle 2:  addr=2  → Check spk2
...
Cycle 7:  addr=7  → Check spk7
Cycle 8:  addr=0  → Back to spk0
...
```

**Arbitration Period**: 8 cycles (half the period of pointer_fifo_controller's 16 cycles)

### Arbitration Logic (Combinational)

```verilog
always @(*) begin
    // Default: No reads, no writes
    spk0_rden = 1'b0;
    spk1_rden = 1'b0;
    // ... (all spk*_rden = 0)
    spk2ciFIFO_din = 32'dX;  // Note: Typo - should be 17'dX
    spk2ciFIFO_wren = 1'b0;

    case (addr)
        3'd0: begin
            if (!spk0_empty & !spk2ciFIFO_full) begin
                spk0_rden       = 1'b1;
                spk2ciFIFO_din  = spk0_dout;
                spk2ciFIFO_wren = 1'b1;
            end
        end
        3'd1: begin
            if (!spk1_empty & !spk2ciFIFO_full) begin
                spk1_rden       = 1'b1;
                spk2ciFIFO_din  = spk1_dout;
                spk2ciFIFO_wren = 1'b1;
            end
        end
        // ... (pattern repeats for 3'd2 through 3'd7)

        default: begin
            // All outputs stay at default (0 or X)
        end
    endcase
end
```

**Logic Breakdown**:
```
For each cycle:
  IF addr==N AND spk[N]_empty==0 AND spk2ciFIFO_full==0
    THEN:
      spk[N]_rden = 1      (read from spike FIFO N)
      spk2ciFIFO_din = spk[N]_dout  (forward data)
      spk2ciFIFO_wren = 1  (write to output FIFO)
  ELSE:
    (all outputs stay 0, no action)
```

**Identical to pointer_fifo_controller**: Same arbitration pattern, just fewer FIFOs (8 vs 16).

### Spike Data Format (17 bits)

While the exact format isn't documented in the code, typical interpretations:

**Option 1: Valid + Address**
```
Bit [16]:    Spike valid (1=spike, 0=no spike / padding)
Bits [15:0]: Post-synaptic neuron address (0-65535)
```

**Option 2: MSB Address + LSB Metadata**
```
Bit [16]:    Bank select or overflow flag
Bits [15:0]: Neuron address within bank
```

**Option 3: Signed Weight + Address**
```
Bit [16]:    Sign bit (excitatory/inhibitory)
Bits [15:0]: Neuron address
```

**Most Likely**: Option 1 (valid + address), as this is common in sparse event systems.

**Example**:
```
spk0_dout = 17'h10ABC  → Spike valid, neuron address 0x0ABC (2748)
spk1_dout = 17'h00000  → No spike (padding entry)
spk2_dout = 17'h1FFFF  → Spike valid, neuron address 0xFFFF (65535)
```

---

## Timing Diagrams

### Round-Robin Arbiter Operation

```
Cycle    0    1    2    3    4    5    6    7    8
         ────┬────┬────┬────┬────┬────┬────┬────┬────
addr         0    1    2    3    4    5    6    7    0

spk0_empty   ────┐                             ┌─────
             ────└─────────────────────────────┘
             (has data cycles 0-7, empty at 8)

spk1_empty   ───────────────────────────────────────
             (empty throughout)

spk2_empty   ──────────┐                   ┌────────
             ──────────└───────────────────┘
             (has data cycles 2-6)

spk5_empty   ───────────────────────┐   ┌───────────
             ───────────────────────└───┘
             (has data cycles 5-6)

spk2ciFIFO_full ────────────────────────────────────
                (never full)

spk0_rden    ───┐                             ┌─────
             ───└─────────────────────────────┘

spk2_rden    ──────────┐
             ──────────└───────────────────────────

spk5_rden    ───────────────────────┐
             ───────────────────────└───────────────

spk2ciFIFO_wren ──┐    ┌───────────┐───────────┐
                ──└────┘           └───────────┘

spk2ciFIFO_din     S0   S2         S5          X

Explanation:
  Cycle 0 (addr=0): spk0 not empty → read spk0, write spk2ciFIFO
  Cycle 1 (addr=1): spk1 empty → skip
  Cycle 2 (addr=2): spk2 not empty → read spk2, write spk2ciFIFO
  Cycle 3-4: All FIFOs empty → skip
  Cycle 5 (addr=5): spk5 not empty → read spk5, write spk2ciFIFO
  Cycle 6-7: Empty → skip
  Cycle 8 (addr=0): Back to spk0
```

### Backpressure Handling

```
Cycle        0    1    2    3    4
             ────┬────┬────┬────┬────
addr             0    1    2    3    4

spk0_empty   ────────────────────────
             (has data)

spk2ciFIFO_full ───────┐         ┌───
                ───────└─────────┘
                (becomes full at cycle 1)

spk0_rden    ───┐         ┌─────┐
             ───└─────────┘     └────

spk2ciFIFO_wren ┐         ┌─────┐
             ───└─────────┘     └────

Explanation:
  Cycle 0: spk0 has data, output FIFO not full → read and write
  Cycle 1: Output FIFO becomes full → blocked (no read/write)
  Cycle 2: Output FIFO still full → still blocked
  Cycle 3: Output FIFO has space again → resume read/write
  Cycle 4: Continue normal operation

  During cycles 1-2, addr continues incrementing (1→2→3),
  but no operations occur. When spk2ciFIFO has space again,
  arbiter is at addr=3, not addr=0 (missed spk0's turn).
  spk0 will be serviced again when addr wraps back to 0.
```

---

## Resource Usage

### Logic Complexity

**Arbiter**:
- **Counter**: 3-bit register (3 FFs)
- **8:1 Mux**: ~24 LUTs (17 bits × 8-way = ~1.5 LUTs per bit)
- **Control Logic**: ~16 LUTs (empty checks, full checks, case statement)
- **Total**: ~40 LUTs, ~3 FFs

**No FIFOs Instantiated**: This module only arbitrates; FIFOs instantiated elsewhere.

### Comparison to Pointer FIFO Controller

| Metric | Pointer FIFO Controller | Spike FIFO Controller |
|--------|------------------------|----------------------|
| **Input FIFOs** | 16 (ptr0-15) | 8 (spk0-7) |
| **Data Width** | 32 bits (pointer) | 17 bits (spike) |
| **Address Bits** | 4 bits (16 FIFOs) | 3 bits (8 FIFOs) |
| **Arbiter Period** | 16 cycles | 8 cycles |
| **Logic** | ~150 LUTs, ~50 FFs | ~40 LUTs, ~3 FFs |
| **Writes Input FIFOs** | Yes (demux from HBM) | No (read-only) |
| **Complexity** | High (demux + arbiter) | Low (arbiter only) |

**Spike FIFO Controller is simpler**: No demux logic, fewer FIFOs, narrower data.

---

## Cross-References

### Upstream Modules

- **hbm_processor.v** (`hbm_processor.md`):
  - Writes spike data to spk0-7 FIFOs during Phase 2
  - Generates spikes based on synaptic weight application
  - Each spike FIFO corresponds to a subset of HBM output channels

### Downstream Modules

- **internal_events_processor.v** (`internal_events_processor.md`):
  - Receives aggregated spikes from spk2ciFIFO
  - Uses spikes to update URAM neuron potentials in next time step
  - Coordinates Phase 1b (internal event processing)

### Peer Modules

- **pointer_fifo_controller.v** (`pointer_fifo_controller.md`):
  - Similar architecture (round-robin arbiter)
  - Handles pointer data instead of spikes
  - Both operate concurrently during Phase 2

---

## Common Issues and Debugging

### Issue 1: Spikes Lost (FIFO Overflow)

**Symptoms:**
- Neurons don't receive expected updates
- Spike counts lower than expected
- spk*_full flags assert frequently (if monitored)

**Root Cause:**
- HBM processor generates spikes faster than arbiter drains
- Spike FIFO depth too small

**Debug:**
```verilog
// Add probes for FIFO occupancy (requires FIFO IP configuration)
(* mark_debug = "true" *) wire [9:0] spk0_count;
(* mark_debug = "true" *) wire       spk0_overflow;

// Monitor overflow
always @(posedge clk) begin
    if (spk0_full & spk0_wren)  // Assumes spk0_full and spk0_wren exist
        spk0_overflow <= 1'b1;
end
```

**Solution:**
- Increase spike FIFO depth (e.g., 512 → 1024)
- Optimize arbiter (see Enhancements)
- Reduce spike generation rate (network-level optimization)

### Issue 2: Unfair Arbitration

**Symptoms:**
- Some spike sources take much longer to drain
- Uneven latency across different neuron groups

**Root Cause:**
- Round-robin treats all FIFOs equally
- spk0 with 100 spikes gets same service as spk7 with 1 spike

**Debug:**
```verilog
// Track arbitration wins
(* mark_debug = "true" *) reg [15:0] arb_wins [7:0];

always @(posedge clk) begin
    if (spk0_rden) arb_wins[0] <= arb_wins[0] + 1;
    if (spk1_rden) arb_wins[1] <= arb_wins[1] + 1;
    // ... (repeat for all)
end
```

**Solution:**
- Implement weighted round-robin
- Priority arbitration based on FIFO occupancy
- Skip-empty optimization (see Enhancement #2)

### Issue 3: Counter Wrapping Error

**Symptoms:**
- Some FIFOs never serviced
- Arbiter stuck on certain addresses

**Root Cause:**
- 3-bit counter not wrapping correctly (should wrap 7→0)

**Debug:**
```verilog
(* mark_debug = "true" *) reg [2:0] addr;

// Assertion
always @(posedge clk) begin
    assert ((addr == (prev_addr + 1'b1) % 8) || (!resetn));
end
```

**Solution:**
- Explicit wrap (though automatic wrap should work):
```verilog
always @(posedge clk) begin
    if (!resetn)
        addr <= 3'd0;
    else if (addr == 3'd7)
        addr <= 3'd0;
    else
        addr <= addr + 1'b1;
end
```

### Issue 4: Data Width Mismatch

**Symptoms:**
- Compilation warnings about width mismatch
- spk2ciFIFO_din shows unexpected bit patterns

**Root Cause:**
- Line 112: `spk2ciFIFO_din <= 32'dX;` should be `17'dX`
- Typo from when module had 32-bit interface

**Debug:**
```verilog
// Check for synthesis warnings:
// WARNING: Truncating 32-bit value to 17 bits
```

**Solution:**
```verilog
// Fix line 112 and 247:
spk2ciFIFO_din <= 17'dX;  // Changed from 32'dX
```

### Issue 5: Output FIFO Never Drains

**Symptoms:**
- spk2ciFIFO_full asserts and stays high
- Spike processing stalls

**Root Cause:**
- Downstream consumer (internal events processor) not reading
- Deadlock or timing issue

**Debug:**
```verilog
// Monitor output FIFO state
(* mark_debug = "true" *) wire spk2ciFIFO_full;
(* mark_debug = "true" *) wire spk2ciFIFO_rden;  // From downstream
(* mark_debug = "true" *) wire spk2ciFIFO_empty;

// Track stall duration
reg [15:0] stall_counter;
always @(posedge clk) begin
    if (spk2ciFIFO_full & !spk2ciFIFO_rden)
        stall_counter <= stall_counter + 1;
    else
        stall_counter <= 0;
end
```

**Solution:**
- Verify downstream module (internal events processor) is enabled
- Check for deadlock conditions
- Ensure proper handshaking between modules

---

## Performance Characteristics

### Throughput Analysis

**Arbiter Throughput**:
- **Max**: 1 spike per cycle @ 225 MHz = 225 million spikes/s
- **Typical** (50% FIFO occupancy): ~112 million spikes/s
- **Effective** (accounting for empty FIFOs): Variable

**Arbiter Service Rate per FIFO**:
- **Period**: 8 cycles
- **Rate**: 225 MHz / 8 = 28.125 million spikes/s per FIFO
- **Latency** (best case): 0-7 cycles to service (avg 3.5 cycles)

**Example Scenario** (10% neurons spike):
```
Total neurons: 131,072
Neurons spiking: 13,107 (10%)
Spikes distributed across 8 FIFOs: ~1,638 per FIFO

Drain time per FIFO:
  1,638 spikes / (28.125 M spikes/s) = 58.2 µs

Total drain time (concurrent):
  All FIFOs drain in parallel (interleaved by arbiter)
  Total time ≈ 1,638 spikes × 8 FIFOs / 225 MHz = 58.2 µs

(Assuming continuous draining without stalls)
```

### Latency Analysis

**Spike Latency** (from FIFO write to output):
- **Best Case** (FIFO non-empty, arbiter on correct address, output not full):
  - 1 cycle (immediate)

- **Worst Case** (FIFO just filled, arbiter just passed, output full):
  - Wait for arbiter: 7 cycles (worst case, just missed)
  - Wait for output FIFO space: N cycles (depends on drain rate)
  - **Total**: ~8+ cycles @ 225 MHz = ~35+ ns

- **Average Case**:
  - ~4 cycles @ 225 MHz = ~18 ns

**Comparison to Pointer FIFO Controller**:
- Spike controller: 8-cycle period → average 4-cycle wait
- Pointer controller: 16-cycle period → average 8-cycle wait
- **Spike controller has 2× better average latency**

---

## Safety and Edge Cases

### Edge Case 1: All Spike FIFOs Full Simultaneously

**Scenario**: Every spike FIFO is full, new spikes arriving.

**Behavior**:
- HBM processor tries to write spikes, but FIFOs full
- Writes are **lost** (assuming standard FIFO behavior)
- No indication to upstream (unless full flags monitored)

**Safety**:
- ⚠️ Silent data loss (spikes dropped)
- ❌ No backpressure to HBM processor (design limitation)

**Required**: Ensure FIFOs sized to handle worst-case burst.

### Edge Case 2: No Spikes Generated (Quiescent Network)

**Scenario**: No neurons spike during entire Phase 2.

**Behavior**:
```
All spk*_empty = 1  (all FIFOs empty)

Arbiter cycles addr 0→1→...→7→0, but:
  All cases have condition: if (!spk*_empty & ...)
  Condition always false → no reads, no writes

spk2ciFIFO receives no data (correct behavior)
```

**Safety**:
- ✅ Correct - no spurious spikes generated
- ✅ Arbiter idles without consuming resources
- ✅ Downstream sees empty spk2ciFIFO (correct state)

### Edge Case 3: Single Spike in Single FIFO

**Scenario**: Only spk3 has one spike, all others empty.

**Behavior**:
```
Cycle 0-2 (addr=0-2): All empty, no action
Cycle 3 (addr=3): spk3 not empty → read and write
Cycle 4-7 (addr=4-7): All empty, no action
Cycle 8 (addr=0): Back to start, spk3 now empty
```

**Safety**:
- ✅ Correct - single spike processed
- ✅ Minimal overhead (7 idle cycles, 1 active)
- ⚠️ Inefficient for sparse spikes (see Enhancement #2)

### Edge Case 4: Output FIFO Full (Downstream Backpressure)

**Scenario**: spk2ciFIFO full, upstream FIFOs have data.

**Behavior**:
```
spk2ciFIFO_full = 1

For all cases:
  if (!spk*_empty & !spk2ciFIFO_full)  → Condition false
    spk*_rden = 0
    spk2ciFIFO_wren = 0

Result: No spikes drained, all FIFOs stall
```

**Safety**:
- ✅ Proper backpressure (stops draining)
- ⚠️ Upstream FIFOs may overflow if HBM processor continues writing
- 🔒 Deadlock possible if spk2ciFIFO never drains

**Required**: Ensure spk2ciFIFO downstream consumer always active.

### Safety Check: One-Hot Read Enables

**Assertion**: Verify only one FIFO read per cycle
```verilog
wire [7:0] rdens = {spk7_rden, spk6_rden, ..., spk0_rden};

property one_hot_rdens;
    @(posedge clk) disable iff (~resetn)
    $onehot0(rdens);  // At most one bit set
endproperty
assert_rdens: assert property (one_hot_rdens);
```

### Safety Check: No Spurious Writes

**Assertion**: Ensure write only when read occurs
```verilog
property write_implies_read;
    @(posedge clk) disable iff (~resetn)
    spk2ciFIFO_wren |-> |rdens;  // Write implies at least one read
endproperty
assert_write: assert property (write_implies_read);
```

---

## Future Enhancement Opportunities

### 1. Priority Arbiter (Occupancy-Based)

Favor FIFOs with more data:

```verilog
wire [9:0] spk_counts [7:0];  // Assume FIFO IP provides rd_data_count

reg [2:0] priority_addr;
always @(*) begin
    // Find fullest FIFO
    priority_addr = 0;
    for (int i = 1; i < 8; i++) begin
        if (spk_counts[i] > spk_counts[priority_addr])
            priority_addr = i;
    end
end

// Use priority_addr instead of round-robin addr (when FIFO above threshold)
```

**Benefit**: Reduces overflow risk by draining fuller FIFOs first.

### 2. Skip-Empty Optimization

Jump to next non-empty FIFO:

```verilog
wire [7:0] spks_empty = {spk7_empty, ..., spk0_empty};

reg [2:0] next_addr;
always @(*) begin
    next_addr = addr;
    for (int i = 1; i <= 8; i++) begin
        if (!spks_empty[(addr + i) % 8]) begin
            next_addr = (addr + i) % 8;
            break;
        end
    end
end

always @(posedge clk) begin
    if (!resetn)
        addr <= 3'd0;
    else
        addr <= next_addr;  // Jump to next non-empty
end
```

**Benefit**: ~4× faster draining when many FIFOs empty (worst case 8 cycles → 2 cycles avg).

### 3. Multi-Port Arbiter

Read multiple FIFOs per cycle:

```verilog
// Dual-port: service 2 FIFOs per cycle
reg [2:0] addr_a, addr_b;

always @(posedge clk) begin
    addr_a <= (addr_a + 2) % 8;  // Even addresses
    addr_b <= (addr_b + 2) % 8;  // Odd addresses
end

// Dual mux, dual write to spk2ciFIFO (requires wider interface or double-pump)
```

**Benefit**: 2× throughput (if downstream supports burst writes).

### 4. Configurable FIFO Count

Parameterize for flexibility:

```verilog
module spike_fifo_controller #(
    parameter NUM_FIFOS = 8
)(
    // Generate FIFO ports and arbiter logic
);

// Use generate blocks for scalability
```

**Benefit**: Easy to switch between 8 and 16 FIFOs (uncomment lines 40-72).

### 5. Burst Mode Output

Write multiple spikes per cycle:

```verilog
// Wider output: 4 spikes per cycle
assign spk2ciFIFO_din[67:0] = {spk[addr+3]_dout, spk[addr+2]_dout,
                                spk[addr+1]_dout, spk[addr]_dout};
```

**Benefit**: 4× throughput (requires downstream support).

### 6. Adaptive Arbitration

Switch between round-robin and priority based on load:

```verilog
wire high_load = (spk_counts[0] + spk_counts[1] + ... > THRESHOLD);

assign arb_addr = high_load ? priority_addr : round_robin_addr;
```

**Benefit**: Fair when lightly loaded, efficient when heavily loaded.

### 7. Fix Data Width Typo

Minor bug fix:

```verilog
// Line 112, 247: Change 32'dX to 17'dX
spk2ciFIFO_din <= 17'dX;  // Match actual port width
```

---

## Key Terms and Definitions

| Term | Definition |
|------|------------|
| **Spike FIFO** | Buffer storing spike events (neuron address + metadata) |
| **Round-Robin** | Fair arbitration scheme servicing each FIFO in cyclic order |
| **Post-Synaptic** | Neuron receiving input from synaptic connection (target neuron) |
| **spk2ciFIFO** | Output FIFO aggregating spikes from all spike FIFOs |
| **Arbiter** | Logic deciding which FIFO gets access to shared output |
| **Phase 2** | Synaptic weight application phase (HBM processor generates spikes) |
| **Phase 3** | Spike drain phase (this module aggregates spikes for next time step) |
| **Backpressure** | Flow control where full output FIFO blocks upstream reads |
| **FWFT (assumed)** | First-Word Fall-Through mode (data immediately available) |
| **Starvation** | Condition where some FIFOs never serviced (not possible in round-robin) |
| **17-bit Spike** | Data format: likely {valid, neuron_address} |
| **Arbiter Period** | Number of cycles to service all FIFOs once (8 cycles) |
| **Fairness** | Equal service time for all FIFOs regardless of occupancy |
| **Service Rate** | Frequency at which each FIFO gets arbitration turn (28.125 MHz per FIFO) |

---

## Conclusion

The **Spike FIFO Controller** is an elegantly simple component that performs critical aggregation:

**Design Strengths**:
- **Minimal Complexity**: Pure round-robin arbiter, no demux logic
- **Fair Service**: All spike sources get equal treatment
- **Proven Architecture**: Identical pattern to pointer_fifo_controller
- **Low Resource Usage**: ~40 LUTs, ~3 FFs (negligible)

**Design Limitations**:
- **No Backpressure to Upstream**: HBM processor can overflow spike FIFOs
- **Inefficient for Sparse Spikes**: Wastes cycles checking empty FIFOs
- **Fixed Arbitration**: No priority for fuller FIFOs
- **Minor Bug**: Data width typo (32'dX vs 17'dX)

**Optimization Opportunities**:
- Skip-empty optimization (2-4× faster for sparse spikes)
- Priority arbitration (prevent overflow)
- Multi-port arbiter (2× throughput)
- Burst mode output (4× throughput)

**Critical Parameters**:
- Spike FIFO depth must handle worst-case burst
- Arbiter period (8 cycles) limits drain rate to 225 MHz / 8 = 28.125 M spikes/s per FIFO
- Output FIFO (spk2ciFIFO) must drain faster than aggregate fill rate

For complete system understanding, see cross-referenced modules: `hbm_processor.md` (upstream spike generation), `internal_events_processor.md` (downstream spike consumption), and `pointer_fifo_controller.md` (peer arbiter).