# Pointer FIFO Controller Module

## Overview

The **Pointer FIFO Controller** is a critical datapath component that manages the flow of synaptic pointer data during the two-phase neuromorphic execution cycle. It demultiplexes 512-bit HBM pointer data into 16 parallel FIFOs (one per neuron group), then arbitrates between these FIFOs to feed the HBM processor during Phase 2 (synaptic weight fetch).

### Role in the Software/Hardware Stack

```
                    Phase 1: External/Internal Events
                           (Spike Detection)
                                  |
    ┌─────────────────────────────┼─────────────────────────────┐
    |                             v                             |
    |              [External Events Processor]                  |
    |                      |                                    |
    |              exec_bram_spiked[15:0]                       |
    |                      |                                    |
    |              [Internal Events Processor]                  |
    |                      |                                    |
    |              exec_uram_spiked[15:0]                       |
    |                      |                                    |
    |                      v                                    |
    |         ┌────────────────────────────┐                    |
    |         │ Pointer FIFO Controller    │                    |
    |         │                            │                    |
    | HBM ───>│ 512b → 16×32b demux       │                    |
    |  Data   │ 16 Pointer FIFOs (ptr0-15)│                    |
    |         │ Round-robin arbiter        │                    |
    |         └────────────┬───────────────┘                    |
    |                      |                                    |
    |                ptrFIFO (32b)                              |
    |                      |                                    |
    |                      v                                    |
    |              [HBM Processor]                              |
    |                      |                                    |
    |              Synaptic Weights                             |
    |                      |                                    |
    |                      v                                    |
    |              [Spike FIFOs] ──> Phase 2 Synaptic Updates   |
    └───────────────────────────────────────────────────────────┘
```

**Function**:
- **Demultiplex HBM Pointer Data**: Split 512-bit HBM read into 16×32-bit pointer records
- **Spike-Gated Buffering**: Only store pointers for neurons that spiked (sparse event handling)
- **Fair Arbitration**: Round-robin scheduler ensures all neuron groups get equal service
- **Phase Coordination**: Handle both external (BRAM) and internal (URAM) spike events

**Key Innovation**: By buffering pointers in 16 parallel FIFOs, the module decouples HBM read bandwidth from pointer processing, allowing efficient handling of sparse neural activity.

---

## Module Architecture

```
                           HBM Data Path (512 bits)
                                   |
                                   v
                    ┌──────────────────────────────┐
                    │  exec_hbm_rdata[511:0]       │
                    │  exec_hbm_rvalidready        │
                    └──────────────┬───────────────┘
                                   |
         ┌─────────────────────────┼─────────────────────────┐
         |                         v                         |
         |              Demux to 16 Groups                   |
         |    [31:0]  [63:32]  [95:64] ... [511:480]        |
         |       |       |        |            |             |
         |       v       v        v            v             |
         |   ┌─────┐ ┌─────┐  ┌─────┐      ┌─────┐         |
         |   │FIFO0│ │FIFO1│  │FIFO2│ ...  │FIFO15│         |
         |   │32b  │ │32b  │  │32b  │      │32b   │         |
         |   │FWFT │ │FWFT │  │FWFT │      │FWFT  │         |
         |   └──┬──┘ └──┬──┘  └──┬──┘      └──┬───┘         |
         |      ^        ^        ^             ^            |
         |      |        |        |             |            |
         |   wren0    wren1    wren2         wren15         |
         |      |        |        |             |            |
         |      └────────┴────────┴─────────────┘            |
         |                      |                            |
         |         Spike-Gated Write Enable Logic            |
         |    (bram_spiked[i] | uram_spiked[i]) & !full     |
         |                      ^                            |
         |      ┌───────────────┴───────────────┐            |
         |      |                               |            |
         |  exec_bram_spiked[15:0]   exec_uram_spiked[15:0] |
         |      |                               |            |
         |  ┌───┴────┐                  ┌───────┴──────┐    |
         |  │External│                  │  Internal    │    |
         |  │Events  │                  │  Events      │    |
         |  │Proc.   │                  │  Proc.       │    |
         |  └────────┘                  └──────────────┘    |
         └───────────────────────────────────────────────────┘

                      Round-Robin Arbiter
                             |
                   addr[3:0] counter (0→15)
                             |
         ┌───────────────────┼───────────────────┐
         |                   v                   |
         |        16:1 Multiplexer               |
         |    (Select ptr_dout[addr])            |
         |                   |                   |
         |                   v                   |
         |            ┌─────────────┐            |
         |            │  ptrFIFO    │            |
         |            │  (32-bit)   │            |
         |            │  To HBM Proc│            |
         |            └─────────────┘            |
         └───────────────────────────────────────┘
```

### Two-Phase Operation

**Phase 1a: External Events (BRAM Reading)**
```
1. bram_reading = 1 (set on exec_run)
2. For each HBM read (exec_hbm_rvalidready):
   - Split 512b data into ptr0_din...ptr15_din
   - Write to FIFO[i] if exec_bram_spiked[i]==1 and !ptr_full[i]
3. Continue until exec_bram_phase1_done
4. Transition to Phase 1b
```

**Phase 1b: Internal Events (URAM Reading)**
```
1. uram_reading = 1 (set on exec_bram_phase1_done)
2. For each HBM read (exec_hbm_rvalidready):
   - Split 512b data into ptr0_din...ptr15_din
   - Write to FIFO[i] if exec_uram_spiked[i]==1 and !ptr_full[i]
3. Continue until exec_uram_phase1_done
4. End of Phase 1
```

**Phase 2: Pointer Drain (Concurrent with Phase 1)**
```
1. Round-robin arbiter cycles addr 0→1→2→...→15→0
2. Every cycle:
   - If ptr[addr]_empty==0 and ptrFIFO_full==0:
     * Read from ptr[addr] (rden=1)
     * Write to ptrFIFO (wren=1, din=ptr_dout)
3. HBM processor consumes pointers, fetches synapses
4. Continues until all FIFOs empty
```

---

## Interface Specification

### Clock and Reset
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `clk` | Input | 1 | System clock (225 MHz typical) |
| `resetn` | Input | 1 | Active-low asynchronous reset |

### Execution Control
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `exec_run` | Input | 1 | Start new time step (sets bram_reading=1) |

### External Events Processor Interface
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `exec_bram_spiked` | Input | 16 | Spike mask from external events (16 neuron groups) |
| `exec_bram_phase1_done` | Input | 1 | External events complete, transition to internal |

### Internal Events Processor Interface
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `exec_uram_spiked` | Input | 16 | Spike mask from internal events (16 neuron groups) |
| `exec_uram_phase1_done` | Input | 1 | Internal events complete, end Phase 1 |

### HBM Processor Input Interface
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `exec_hbm_rvalidready` | Input | 1 | HBM read data valid and ready |
| `exec_hbm_rdata` | Input | 512 | HBM read data (16 pointers × 32 bits) |
| `hbm2pfc_rden` | Output | 1 | FIFO read enable (FWFT mode) |

**Note**: Comments indicate `hbm2pfc_dout` and `hbm2pfc_empty` are wired at top wrapper level.

### Pointer FIFO Interfaces (16 instances: ptr0-ptr15)

Each pointer FIFO has identical interface (example for ptr0):

| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `ptr0_full` | Input | 1 | FIFO full flag (backpressure) |
| `ptr0_din` | Output | 32 | Data input to FIFO (pointer record) |
| `ptr0_wren` | Output | 1 | Write enable (gated by spike and full) |
| `ptr0_empty` | Input | 1 | FIFO empty flag |
| `ptr0_dout` | Input | 32 | Data output from FIFO |
| `ptr0_rden` | Output | 1 | Read enable (from arbiter) |

**Pointer FIFOs**: ptr1, ptr2, ..., ptr15 (identical interfaces)

### HBM Processor Output Interface (Aggregated Pointer FIFO)
| Port | Direction | Width | Description |
|------|-----------|-------|-------------|
| `ptrFIFO_full` | Input | 1 | Aggregated FIFO full flag |
| `ptrFIFO_din` | Output | 32 | Pointer data to HBM processor |
| `ptrFIFO_wren` | Output | 1 | Write enable (from arbiter) |

---

## Detailed Logic Description

### Phase Tracking State Machine

The module uses two registers to track execution phase:

```verilog
reg bram_reading;  // Phase 1a: External events
reg uram_reading;  // Phase 1b: Internal events

always @(posedge clk) begin
    if (!resetn) begin
        bram_reading <= 1'b0;
        uram_reading <= 1'b0;
    end else if (exec_run) begin
        // Start of new time step: begin external event processing
        bram_reading <= 1'b1;
    end else if (exec_bram_phase1_done & !uram_reading) begin
        // Transition from external to internal event processing
        bram_reading <= 1'b0;
        uram_reading <= 1'b1;
    end else if (exec_uram_phase1_done) begin
        // End of Phase 1
        uram_reading <= 1'b0;
    end
end
```

**State Transitions:**
```
IDLE (both=0)
    |
    | exec_run
    v
BRAM_READING (bram=1, uram=0)
    |
    | exec_bram_phase1_done
    v
URAM_READING (bram=0, uram=1)
    |
    | exec_uram_phase1_done
    v
IDLE (both=0)
```

**Note**: During idle, the round-robin arbiter continues draining pointer FIFOs (Phase 2).

### HBM Data Demultiplexing

The 512-bit HBM data is split into 16 groups of 32 bits:

```verilog
// Direct bit-slice assignments
assign ptr0_din  = exec_hbm_rdata[031:000];  // Bits 0-31
assign ptr1_din  = exec_hbm_rdata[063:032];  // Bits 32-63
assign ptr2_din  = exec_hbm_rdata[095:064];  // Bits 64-95
assign ptr3_din  = exec_hbm_rdata[127:096];  // Bits 96-127
// ... (pattern continues)
assign ptr15_din = exec_hbm_rdata[511:480];  // Bits 480-511
```

**Data Layout** (each 32-bit pointer):
```
Bits [31:23] = Length (9 bits, max 511 synapses)
Bits [22:0]  = Start address in HBM (23 bits, byte address)
```

**Example**:
```
exec_hbm_rdata = 512'h...AB12_3456_CD78_9ABC_...

ptr0_din  = 32'hCD78_9ABC  → Length=0x1AF, Addr=0x389ABC
ptr1_din  = 32'hAB12_3456  → Length=0x156, Addr=0x523456
...
```

### Spike-Gated Write Enable Logic

Each pointer FIFO write is conditional on:
1. HBM data valid and ready
2. Corresponding spike bit asserted
3. FIFO not full

```verilog
assign ptr0_wren = !ptr0_full & exec_hbm_rvalidready &
                   ((bram_reading & exec_bram_spiked[0]) |
                    (uram_reading & exec_uram_spiked[0]));

assign ptr1_wren = !ptr1_full & exec_hbm_rvalidready &
                   ((bram_reading & exec_bram_spiked[1]) |
                    (uram_reading & exec_uram_spiked[1]));

// ... (pattern repeats for ptr2-ptr15)
```

**Logic Breakdown**:
```
ptr_wren[i] = !ptr_full[i]           // FIFO has space
            & exec_hbm_rvalidready   // HBM data available
            & (
                (bram_reading & exec_bram_spiked[i])  // External spike
                |
                (uram_reading & exec_uram_spiked[i])  // Internal spike
              )
```

**Example Scenarios**:

**Scenario 1: External spike on neuron group 5**
```
Cycle N:
  bram_reading = 1
  exec_bram_spiked = 16'b0000_0000_0010_0000  (bit 5 set)
  exec_hbm_rvalidready = 1
  ptr5_full = 0

Result:
  ptr5_wren = 1  → Write exec_hbm_rdata[191:160] to ptr5 FIFO
  ptr0-4,6-15_wren = 0  → No write to other FIFOs
```

**Scenario 2: Multiple spikes (groups 0, 3, 7)**
```
Cycle N:
  uram_reading = 1
  exec_uram_spiked = 16'b0000_0000_1000_1001  (bits 0,3,7 set)
  exec_hbm_rvalidready = 1
  ptr0_full = 0, ptr3_full = 0, ptr7_full = 1  (ptr7 full!)

Result:
  ptr0_wren = 1  → Write to ptr0
  ptr3_wren = 1  → Write to ptr3
  ptr7_wren = 0  → Blocked by full (data lost!)
  Others = 0
```

**Backpressure Handling**: If any FIFO is full when its spike arrives, that pointer is **lost**. System must ensure FIFOs drain fast enough.

### Round-Robin Arbiter

A 4-bit counter cycles through FIFOs 0-15, servicing one per cycle:

```verilog
reg [3:0] addr;  // 4 bits for 16 FIFOs (0-15)

always @(posedge clk) begin
    if (~resetn)
        addr <= 4'd0;
    else
        addr <= addr + 1'b1;  // Wraps 15→0 automatically
end
```

**Arbitration Cycle**:
```
Cycle 0:  addr=0  → Check ptr0
Cycle 1:  addr=1  → Check ptr1
Cycle 2:  addr=2  → Check ptr2
...
Cycle 15: addr=15 → Check ptr15
Cycle 16: addr=0  → Back to ptr0
...
```

**Arbitration Logic** (combinational):

```verilog
always @(*) begin
    // Default: No reads, no writes
    ptr0_rden = 1'b0;
    ptr1_rden = 1'b0;
    // ... (all ptr*_rden = 0)
    ptrFIFO_din = 32'dX;
    ptrFIFO_wren = 1'b0;

    case (addr)
        4'd0: begin
            if (~ptr0_empty & ~ptrFIFO_full) begin
                ptr0_rden    = 1'b1;
                ptrFIFO_din  = ptr0_dout;
                ptrFIFO_wren = 1'b1;
            end
        end
        4'd1: begin
            if (~ptr1_empty & ~ptrFIFO_full) begin
                ptr1_rden    = 1'b1;
                ptrFIFO_din  = ptr1_dout;
                ptrFIFO_wren = 1'b1;
            end
        end
        // ... (pattern repeats for 4'd2 through 4'd15)

        default: begin
            // All outputs stay at default (0 or X)
        end
    endcase
end
```

**Arbitration Example**:

```
Cycle  | addr | ptr0_empty | ptr1_empty | ptr5_empty | ptrFIFO_full | Action
-------|------|------------|------------|------------|--------------|------------------
   0   |  0   |      0     |      0     |      0     |      0       | Read ptr0
   1   |  1   |      0     |      0     |      0     |      0       | Read ptr1
   2   |  2   |      1     |      0     |      0     |      0       | Skip (empty)
   3   |  3   |      1     |      1     |      0     |      0       | Skip (empty)
   4   |  4   |      1     |      1     |      0     |      0       | Skip (empty)
   5   |  5   |      1     |      1     |      0     |      0       | Read ptr5
   6   |  6   |      1     |      1     |      1     |      0       | Skip (empty)
   7   |  7   |      1     |      1     |      1     |      0       | Skip (empty)
  ...  | ...  |     ...    |     ...    |     ...    |     ...      | ...
  15   | 15   |      1     |      1     |      1     |      0       | Skip (empty)
  16   |  0   |      0     |      1     |      1     |      0       | Read ptr0 again
```

**Fairness**: Each FIFO gets equal opportunity (once per 16 cycles), regardless of occupancy.

**Starvation**: If a FIFO is always full, other FIFOs continue to be serviced. No single FIFO can block others.

### FWFT (First-Word Fall-Through) Mode

The FIFOs operate in FWFT mode, meaning data appears on `dout` immediately when `empty` deasserts:

```
Traditional FIFO:
  Cycle N:   rden=1  (issue read)
  Cycle N+1: dout valid  (1 cycle latency)

FWFT FIFO:
  Cycle N:   empty=0, dout already valid
  Cycle N:   rden=1  (consume word, advance to next)
  Cycle N+1: dout shows next word (if available)
```

**Why FWFT?**: Reduces latency - arbiter can read and forward pointer in single cycle.

**HBM FIFO Read Enable**:

```verilog
assign hbm2pfc_rden = exec_hbm_rvalidready;
```

Every time HBM data is consumed (`exec_hbm_rvalidready=1`), the FIFO is advanced to present next 512-bit word. This assumes FWFT mode on the HBM data FIFO.

---

## Timing Diagrams

### Phase Transition: BRAM → URAM

```
Cycle    0    1    2    3    4    5    6    7    8    9
         ────┬────┬────┬────┬────┬────┬────┬────┬────┬────
exec_run ───┐    ┌─────────────────────────────────────────
         ───└────┘

bram_reading ────┐                        ┌────────────────
         ────────└────────────────────────┘

uram_reading ─────────────────────────┐              ┌─────
         ─────────────────────────────└──────────────┘

exec_bram_phase1_done ────────────┐    ┌─────────────────
                      ────────────└────┘

exec_uram_phase1_done ─────────────────────────┐    ┌─────
                      ─────────────────────────└────┘

Phase        IDLE   BRAM  BRAM  BRAM  BRAM URAM URAM URAM IDLE
```

### Pointer FIFO Write (Spike-Gated)

```
Cycle        0    1    2    3    4    5    6
             ────┬────┬────┬────┬────┬────┬────
bram_reading ───────────────────────────────────
             ───┐
                └───────────────────────────────

exec_hbm_rvalidready ──┐    ┌───┐    ┌───┐    ┌
                   ────└────┘   └────┘   └────┘

exec_bram_spiked    0x0005   0x0003   0x0000
                    (bits 0,2)(bits 0,1) (none)

ptr0_full       ───────────────────────────────  (always room)

ptr0_wren       ───┐         ┌───┐
                ───└─────────┘   └─────────────  (spike bit 0)

ptr1_wren       ────────────────┐
                ────────────────└───────────────  (spike bit 1)

ptr2_wren       ───┐
                ───└───────────────────────────  (spike bit 2)

ptr0_din            P0        P0'
                    ↓         ↓
ptr0 FIFO       [empty] → [P0] → [P0,P0']

Explanation:
  Cycle 1: exec_bram_spiked=0x0005 (bits 0 and 2)
           → ptr0_wren=1, ptr2_wren=1
           → Write to ptr0 and ptr2 FIFOs

  Cycle 3: exec_bram_spiked=0x0003 (bits 0 and 1)
           → ptr0_wren=1, ptr1_wren=1
           → Write to ptr0 (again) and ptr1 FIFOs

  Cycle 5: exec_bram_spiked=0x0000 (no spikes)
           → All ptr*_wren=0
           → No writes (HBM data ignored)
```

### Round-Robin Arbiter Operation

```
Cycle    0    1    2    3    4    5    6    7    8
         ────┬────┬────┬────┬────┬────┬────┬────┬────
addr         0    1    2    3    4    5    6    7    8

ptr0_empty   ────┐                             ┌─────
             ────└─────────────────────────────┘
             (has data cycles 0-7, empty at 8)

ptr1_empty   ───────────────────────────────────────
             (empty throughout)

ptr2_empty   ──────────┐                   ┌────────
             ──────────└───────────────────┘
             (has data cycles 2-6)

ptrFIFO_full ───────────────────────────────────────
             (never full)

ptr0_rden    ───┐                             ┌─────
             ───└─────────────────────────────┘

ptr2_rden    ──────────┐
             ──────────└───────────────────────────

ptrFIFO_wren ───┐       ┌─────────────────────┐
             ───└───────┘                     └─────

ptrFIFO_din      D0      D2                    X

Explanation:
  Cycle 0 (addr=0): ptr0 not empty → read ptr0, write ptrFIFO
  Cycle 1 (addr=1): ptr1 empty → skip
  Cycle 2 (addr=2): ptr2 not empty → read ptr2, write ptrFIFO
  Cycle 3-7: All empty → skip
  Cycle 8 (addr=8): Continue round-robin (wraps at 15)
```

### FIFO Full Backpressure

```
Cycle        0    1    2    3    4    5
             ────┬────┬────┬────┬────┬────
exec_hbm_rvalidready ┐    ┌───┐    ┌───┐
                 ────└────┘   └────┘   └

exec_bram_spiked  0x0001 0x0001 0x0001
                  (bit 0)(bit 0)(bit 0)

ptr0_full     ────────────┐         ┌────
              ────────────└─────────┘
              (becomes full at cycle 2)

ptr0_wren     ───┐    ┌───┐         ┌────
              ───└────┘   └─────────┘

ptr0_din          D0   D1   X     D2

ptr0 contents [D0] [D0,D1] [D0,D1] [D1,D2]

Explanation:
  Cycle 1: Write D0 to ptr0 (wren=1)
  Cycle 2: ptr0 becomes full
  Cycle 3: D1 written, but ptr0_full=1 → wren=0 → D1 LOST!
  Cycle 4: ptr0 not full again
  Cycle 5: D2 written (wren=1)

  Result: D1 was lost due to FIFO full condition!
```

**Prevention**: Ensure arbiter drains FIFOs faster than they fill, or increase FIFO depth.

---

## Memory and Resource Usage

### FIFO Depth Considerations

**Minimum FIFO Depth** (to avoid loss):

Assume:
- Max neurons per group: 8192 (131,072 / 16)
- Worst case: All neurons in one group spike
- Arbiter services each FIFO once per 16 cycles

**Fill Rate** (during bram_reading or uram_reading):
- 1 pointer per HBM read (exec_hbm_rvalidready)
- Max rate: 1 per cycle (if HBM always ready)

**Drain Rate**:
- 1 pointer per 16 cycles (round-robin)

**Net Accumulation**:
- Fill: +1 per cycle (worst case)
- Drain: +1 per 16 cycles
- Net: +15 pointers per 16 cycles

**Depth Calculation**:
```
Time to process 8192 neurons @ 225 MHz:
  8192 / 16 (axons per HBM read) = 512 HBM reads
  512 cycles @ 225 MHz = 2.27 µs

Pointers accumulated in one FIFO (worst case):
  All 8192 neurons in one group spike
  = 8192 / 16 = 512 pointers
  (Each HBM read provides 1 pointer for that group)

Pointers drained during 512 cycles:
  512 / 16 = 32 pointers

Net FIFO occupancy:
  512 - 32 = 480 pointers

Required FIFO depth: ~512 (power of 2 for FPGA FIFOs)
```

**Typical FIFO Configuration**:
- **Depth**: 512 or 1024 entries
- **Width**: 32 bits
- **Type**: Distributed RAM (for small depth) or Block RAM
- **Mode**: FWFT (First-Word Fall-Through)

### Resource Estimates

**Per Pointer FIFO** (16 instances):
- **Depth 512 × 32b** = 16 Kb = 0.89 BRAM18K (use 1 BRAM18K)
- **FWFT logic**: ~50 LUTs, ~30 FFs

**Total for 16 FIFOs**:
- **BRAM18K**: 16 (one per FIFO)
- **LUTs**: ~800 (FIFOs) + ~200 (arbiter) = ~1000
- **FFs**: ~500 (FIFOs) + ~50 (arbiter/control) = ~550

**Controller Logic**:
- **Demux**: 16 × 32-bit slices (wiring only, ~0 LUTs)
- **Write Enable**: 16 × (4-input AND + OR) = ~96 LUTs
- **Arbiter**: 16-way mux + control = ~150 LUTs
- **Phase Control**: ~20 LUTs, ~3 FFs

---

## Cross-References

### Upstream Modules

- **external_events_processor.v** (`external_events_processor.md`):
  - Provides `exec_bram_spiked[15:0]` (external spike mask)
  - Asserts `exec_bram_phase1_done` to signal phase transition

- **internal_events_processor.v** (`internal_events_processor.md`):
  - Provides `exec_uram_spiked[15:0]` (internal spike mask)
  - Asserts `exec_uram_phase1_done` to signal phase 1 complete

- **hbm_processor.v** (`hbm_processor.md`):
  - Provides `exec_hbm_rdata[511:0]` (pointer data from HBM)
  - Provides `exec_hbm_rvalidready` (data valid signal)
  - Receives `ptrFIFO_din`, `ptrFIFO_wren` (aggregated pointers for Phase 2)

### Downstream Modules

- **hbm_processor.v** (`hbm_processor.md`):
  - Consumes pointers from `ptrFIFO`
  - Uses pointers to fetch synaptic weights during Phase 2
  - Sends fetched synapses to spike FIFOs

### Peer Modules

- **spike_fifo_controller.v** (`spike_fifo_controller.md`):
  - Similar architecture (demux + arbiter)
  - Handles synaptic weight data instead of pointers
  - Works in Phase 2 alongside this module's pointer drain

---

## Common Issues and Debugging

### Issue 1: Pointers Lost (FIFO Overflow)

**Symptoms:**
- Neurons don't receive expected synaptic updates
- FIFO full flags assert frequently
- Spike counts don't match expected connectivity

**Root Cause:**
- Arbiter can't drain FIFOs fast enough
- FIFO depth too small for burst activity

**Debug:**
```verilog
// Add probes for FIFO occupancy
(* mark_debug = "true" *) wire [9:0] ptr0_count;  // Assuming 512-deep FIFO
(* mark_debug = "true" *) wire       ptr0_overflow;

// Monitor overflow events
always @(posedge clk) begin
    if (ptr0_full & ptr0_wren)
        ptr0_overflow <= 1'b1;  // Overflow detected!
end
```

**Solution:**
- Increase FIFO depth (512 → 1024 or 2048)
- Optimize arbiter (see Enhancement #1 below)
- Add priority arbitration for fuller FIFOs

### Issue 2: Unfair Arbitration (Starvation)

**Symptoms:**
- Some neuron groups process much slower than others
- Uneven latency across different spike patterns

**Root Cause:**
- Round-robin gives equal slots, but some FIFOs have more data
- FIFO[0] with 100 entries gets same service as FIFO[15] with 1 entry

**Debug:**
```verilog
// Track arbitration wins per FIFO
(* mark_debug = "true" *) reg [15:0] arb_wins [15:0];

always @(posedge clk) begin
    if (ptr0_rden) arb_wins[0] <= arb_wins[0] + 1;
    if (ptr1_rden) arb_wins[1] <= arb_wins[1] + 1;
    // ... (repeat for all FIFOs)
end
```

**Solution:**
- Implement weighted round-robin (award more slots to fuller FIFOs)
- Use priority encoder favoring non-empty FIFOs
- Skip empty FIFOs faster (see Enhancement #2)

### Issue 3: Phase Transition Glitch

**Symptoms:**
- Pointers written with wrong spike mask during phase boundary
- Corruption at transition from BRAM to URAM reading

**Root Cause:**
- Race condition between `exec_bram_phase1_done` and last HBM read
- Write enable uses old phase flags

**Debug:**
```verilog
// Monitor phase transition timing
(* mark_debug = "true" *) reg phase_transition;

always @(posedge clk) begin
    if (exec_bram_phase1_done & !uram_reading)
        phase_transition <= 1'b1;
    else
        phase_transition <= 1'b0;
end

// Check if any writes occur during transition
assert property (@(posedge clk)
    phase_transition |-> (|{ptr0_wren, ptr1_wren, ..., ptr15_wren} == 0)
);
```

**Solution:**
- Pipeline phase flags by one cycle
- Add guard time between phases (no writes for 1 cycle)
- Use registered versions of bram_reading/uram_reading for write enables

### Issue 4: HBM FIFO Not Advancing

**Symptoms:**
- Same HBM data appears multiple times
- Pointer FIFOs fill with duplicate entries

**Root Cause:**
- `hbm2pfc_rden` not properly connected or not asserting
- FWFT mode misconfigured on HBM FIFO

**Debug:**
```verilog
// Verify read enable toggles
(* mark_debug = "true" *) wire hbm2pfc_rden;
(* mark_debug = "true" *) wire exec_hbm_rvalidready;
(* mark_debug = "true" *) wire [511:0] exec_hbm_rdata;

// Check for stuck data
reg [511:0] prev_hbm_rdata;
always @(posedge clk) begin
    if (exec_hbm_rvalidready)
        prev_hbm_rdata <= exec_hbm_rdata;
end

// Assert: consecutive reads should have different data (usually)
// (unless network connectivity happens to repeat, rare)
```

**Solution:**
- Verify FWFT mode enabled on HBM FIFO IP
- Check that `hbm2pfc_rden` is wired to FIFO's read enable
- Confirm FIFO has data (not empty)

### Issue 5: Address Counter Wrapping Incorrectly

**Symptoms:**
- Some FIFOs never serviced
- Arbiter stuck on certain addresses

**Root Cause:**
- 4-bit counter not wrapping correctly (should wrap 15→0)
- Synthesis optimization error

**Debug:**
```verilog
// Monitor counter progression
(* mark_debug = "true" *) reg [3:0] addr;
(* mark_debug = "true" *) reg [3:0] prev_addr;

always @(posedge clk) begin
    prev_addr <= addr;
    // Check for proper increment (with wrap)
    assert ((addr == (prev_addr + 1'b1)) || (!resetn));
end
```

**Solution:**
- Explicitly handle wrap:
```verilog
always @(posedge clk) begin
    if (~resetn)
        addr <= 4'd0;
    else if (addr == 4'd15)
        addr <= 4'd0;  // Explicit wrap
    else
        addr <= addr + 1'b1;
end
```

---

## Performance Characteristics

### Throughput Analysis

**HBM Read Bandwidth**:
- **Peak**: 512 bits per cycle @ 225 MHz = 14.4 GB/s
- **Typical**: Limited by HBM latency and contention (~50% efficiency) = 7.2 GB/s
- **Pointers per Second**: (7.2 GB/s) / (32 bits) = 1.8 billion pointers/s

**Arbiter Throughput**:
- **Max**: 1 pointer per cycle @ 225 MHz = 225 million pointers/s
- **Typical** (50% FIFO occupancy): ~112 million pointers/s
- **Bottleneck**: Arbiter is **NOT** the bottleneck (HBM fill rate >> drain rate in Phase 1)

**Phase 1 Duration** (example: 131,072 neurons):
```
External Events:
  Input axons: 16,384 (assuming 16 per HBM read)
  HBM reads: 16,384 / 16 = 1,024 reads
  Time @ 225 MHz: 1,024 cycles = 4.55 µs

Internal Events:
  URAM neurons: 131,072
  URAM rows: 131,072 / 2 = 65,536 (2 neurons per row)
  URAM banks: 16
  Rows per bank: 65,536 / 16 = 4,096
  HBM reads per bank: 4,096 / 16 = 256 (if 16 neurons spike per read)
  Total HBM reads: ~16,384 (worst case, all banks active)
  Time @ 225 MHz: 16,384 cycles = 72.8 µs

Total Phase 1: ~77 µs
```

**Phase 2 Duration** (pointer drain):
```
Assume 10% neurons spike (13,107 neurons):
  Pointers to process: 13,107
  Arbiter rate: 1 per 16 cycles (round-robin overhead)
  Effective drain: 225 MHz / 16 = 14.06 million pointers/s

  Time: 13,107 pointers / 14.06M/s = 0.93 ms

But Phase 2 overlaps with next Phase 1!
  Phase 1 and 2 pipeline, so overall latency = max(Phase1, Phase2)
  Typical: Phase 2 >> Phase 1, so Phase 2 dominates
```

**Latency** (pointer from HBM to ptrFIFO):
- **Best Case** (FIFO empty, arbiter on correct address):
  - FWFT mode: 0 cycles (immediate)
  - Write to ptrFIFO: 1 cycle
  - **Total**: 1 cycle @ 225 MHz = 4.4 ns

- **Worst Case** (FIFO full, arbiter just passed):
  - Wait for FIFO space: N cycles (depends on drain rate)
  - Wait for arbiter: 15 cycles (worst case, just missed)
  - **Total**: ~16 cycles @ 225 MHz = 71 ns (ignoring FIFO drain time)

### Resource Utilization Summary

| Resource | Usage | Notes |
|----------|-------|-------|
| LUTs | ~1,200 | Demux, arbiter, control, FIFO logic |
| FFs | ~550 | Phase control, arbiter, FIFO pointers |
| BRAM18K | 16 | One per pointer FIFO (512×32b each) |
| DSPs | 0 | No arithmetic operations |

**Percentage of Typical FPGA** (e.g., Xilinx UltraScale+ VU9P):
- LUTs: 1,200 / 1,182,240 = 0.1%
- FFs: 550 / 2,364,480 = 0.02%
- BRAM18K: 16 / 2,160 = 0.74%

**Conclusion**: Very lightweight module, dominated by FIFO storage.

---

## Safety and Edge Cases

### Edge Case 1: All Neurons Spike Simultaneously

**Scenario**: Every neuron in every group spikes in same cycle.

**Behavior**:
```
exec_bram_spiked = 16'hFFFF  (all bits set)
All 16 pointer FIFOs receive write:
  ptr0_wren = 1, ptr1_wren = 1, ..., ptr15_wren = 1

Each FIFO receives 1 pointer per HBM read.
```

**Safety**:
- ✅ All writes occur in parallel (16 separate FIFOs)
- ✅ No conflicts (each FIFO independent)
- ⚠️ FIFO depth must handle burst (512+ pointers)
- ⚠️ Arbiter drain rate becomes critical (1 per 16 cycles)

**Result**: System handles correctly if FIFO depth adequate.

### Edge Case 2: No Neurons Spike (Quiescent Network)

**Scenario**: No spikes in entire time step.

**Behavior**:
```
exec_bram_spiked = 16'h0000  (all bits clear)
exec_uram_spiked = 16'h0000

All ptr*_wren = 0  (no writes to any FIFO)
HBM reads still occur, but data discarded.
```

**Safety**:
- ✅ No FIFO writes (correct behavior)
- ✅ Arbiter continues cycling (no-op, all FIFOs empty)
- ✅ Phase transitions occur normally
- ⚠️ HBM bandwidth wasted (reading data that's discarded)

**Optimization Opportunity**: Gate HBM reads based on spike mask (see Enhancements).

### Edge Case 3: Single Bit Spike (Minimal Activity)

**Scenario**: Only one neuron in one group spikes.

**Behavior**:
```
exec_bram_spiked = 16'h0001  (only bit 0 set)

Only ptr0_wren = 1  (one FIFO active)
Other 15 FIFOs idle.
```

**Safety**:
- ✅ Correct - only relevant FIFO updated
- ✅ Arbiter cycles through all, only reads from ptr0
- ✅ Minimal resource usage

**Result**: Efficient sparse event handling.

### Edge Case 4: ptrFIFO Full (Downstream Backpressure)

**Scenario**: HBM processor can't consume pointers fast enough.

**Behavior**:
```
ptrFIFO_full = 1

Arbiter logic:
  if (~ptr[addr]_empty & ~ptrFIFO_full)  → Condition false!
    ptr[addr]_rden = 0  (no read)
    ptrFIFO_wren = 0    (no write)
```

**Safety**:
- ✅ Arbiter stalls (doesn't read from any pointer FIFO)
- ✅ Upstream pointer FIFOs continue to fill
- ⚠️ If pointer FIFOs also fill, writes are lost (see Issue 1)

**Required**: System must ensure ptrFIFO drains faster than it fills.

### Safety Check: Write Enable Conflicts

**Assertion**: Verify only one arbiter read per cycle
```verilog
wire [15:0] rdens = {ptr15_rden, ptr14_rden, ..., ptr0_rden};

property one_hot_rdens;
    @(posedge clk) disable iff (~resetn)
    $onehot0(rdens);  // At most one bit set
endproperty
assert_rdens: assert property (one_hot_rdens);
```

### Safety Check: Phase Mutual Exclusion

**Assertion**: Ensure bram_reading and uram_reading never both asserted
```verilog
property phases_mutex;
    @(posedge clk) disable iff (~resetn)
    !(bram_reading & uram_reading);
endproperty
assert_phases: assert property (phases_mutex);
```

---

## Future Enhancement Opportunities

### 1. Priority Arbiter

Replace round-robin with priority-based arbitration:

```verilog
// Calculate occupancy for each FIFO (requires rd_data_count from FIFO IP)
wire [9:0] ptr0_count, ptr1_count, ..., ptr15_count;

// Find fullest FIFO (priority encoder)
reg [3:0] priority_addr;
always @(*) begin
    if      (ptr0_count > threshold) priority_addr = 4'd0;
    else if (ptr1_count > threshold) priority_addr = 4'd1;
    // ... (priority order 0→1→2→...→15)
    else priority_addr = addr;  // Fall back to round-robin
end

// Use priority_addr instead of addr in arbiter mux
```

**Benefit**: Prevents FIFO overflow by draining fuller FIFOs first.

### 2. Skip-Empty Optimization

Current arbiter wastes cycles checking empty FIFOs:

```verilog
// Add empty flag aggregation
wire [15:0] ptrs_empty = {ptr15_empty, ..., ptr0_empty};

// Fast-forward to next non-empty FIFO
reg [3:0] next_addr;
always @(*) begin
    next_addr = addr;
    for (int i = 1; i <= 16; i++) begin
        if (!ptrs_empty[(addr + i) & 4'hF]) begin
            next_addr = (addr + i) & 4'hF;
            break;
        end
    end
end

always @(posedge clk) begin
    if (~resetn)
        addr <= 4'd0;
    else
        addr <= next_addr;  // Jump to next non-empty
end
```

**Benefit**: Reduces latency by ~50% when many FIFOs empty.

### 3. Gated HBM Reads

Don't read HBM when no spikes:

```verilog
// Compute OR of spike mask
wire any_spikes = |(exec_bram_spiked | exec_uram_spiked);

// Gate HBM read enable
assign hbm2pfc_rden = exec_hbm_rvalidready & any_spikes;
```

**Benefit**: Saves HBM bandwidth during quiescent periods.

### 4. Configurable FIFO Count

Parameterize number of FIFOs:

```verilog
module pointer_fifo_controller #(
    parameter NUM_FIFOS = 16,
    parameter FIFO_DEPTH = 512
)(
    input [NUM_FIFOS-1:0] exec_bram_spiked,
    // ... (generate FIFO instances and arbiter)
);

// Use generate blocks for FIFO instantiation
genvar i;
generate
    for (i = 0; i < NUM_FIFOS; i++) begin : fifo_gen
        fifo_32x512 ptr_fifo (
            .din(exec_hbm_rdata[(i+1)*32-1 : i*32]),
            .wr_en(ptr_wren[i]),
            // ...
        );
    end
endgenerate
```

**Benefit**: Flexible configuration for different neuron group sizes.

### 5. Multi-Port Arbiter

Read from multiple FIFOs per cycle:

```verilog
// Dual-port arbiter (2 pointers per cycle)
reg [3:0] addr_a, addr_b;

always @(posedge clk) begin
    addr_a <= addr_a + 2;  // Even addresses
    addr_b <= addr_b + 2;  // Odd addresses
end

// Mux for addr_a and addr_b, write to ptrFIFO twice per cycle
```

**Benefit**: 2× drain rate, halves FIFO depth requirements.

**Trade-off**: Requires wider ptrFIFO or double-pumped downstream.

### 6. Adaptive FIFO Depth

Dynamically adjust FIFO depth based on activity:

```verilog
// Use distributed RAM for shallow portion, spill to BRAM when full
// Requires custom FIFO controller with dual-tier storage
```

**Benefit**: Saves BRAM when network activity is sparse.

### 7. Burst Write to ptrFIFO

Instead of one pointer per cycle, burst multiple:

```verilog
// If ptrFIFO has depth, write up to 4 pointers per cycle
// Requires ptrFIFO to accept burst writes (wider interface)

assign ptrFIFO_din[127:0] = {ptr[addr+3]_dout, ptr[addr+2]_dout,
                             ptr[addr+1]_dout, ptr[addr]_dout};
assign ptrFIFO_wren = burst_valid;
```

**Benefit**: 4× drain rate (if downstream supports).

---

## Key Terms and Definitions

| Term | Definition |
|------|------------|
| **Pointer FIFO** | Buffer storing 32-bit pointer records (length + address) for synaptic lists |
| **Round-Robin** | Arbitration scheme giving equal service time to each FIFO in cyclic order |
| **Spike-Gated** | Write enable conditional on neuron spike (sparse event handling) |
| **Demultiplexing** | Splitting wide HBM data (512b) into narrow pointer streams (16×32b) |
| **FWFT (First-Word Fall-Through)** | FIFO mode where data appears immediately on `dout` when not empty |
| **Phase 1a** | External event processing (BRAM reading, external axon spikes) |
| **Phase 1b** | Internal event processing (URAM reading, neuron-to-neuron spikes) |
| **Phase 2** | Synaptic weight fetch (pointer drain, HBM synaptic reads) |
| **Neuron Group** | Set of 16 neurons mapped to one pointer FIFO |
| **Backpressure** | Flow control mechanism where full FIFO blocks upstream writes |
| **Arbiter** | Logic deciding which FIFO gets access to shared resource (ptrFIFO) |
| **ptrFIFO** | Aggregated pointer FIFO feeding HBM processor for Phase 2 |
| **Starvation** | Condition where some FIFOs never serviced (not possible in round-robin) |
| **Overflow** | Condition where pointer write lost due to FIFO full |
| **Pointer Record** | 32-bit datum: [31:23]=length (9b), [22:0]=start address (23b) |
| **HBM rvalidready** | Signal indicating HBM read data valid and consumer ready |
| **exec_run** | Control pulse starting new time step, initiating Phase 1a |

---

## Conclusion

The **Pointer FIFO Controller** is a well-designed datapath component that efficiently manages sparse neural spike events through:

1. **Parallel Buffering**: 16 independent FIFOs decouple HBM read from pointer consumption
2. **Spike-Gated Writes**: Only buffer pointers for neurons that actually spiked (sparse efficiency)
3. **Fair Arbitration**: Round-robin ensures no FIFO monopolizes downstream bandwidth
4. **Two-Phase Coordination**: Seamlessly handles both external and internal event sources

**Design Strengths**:
- Simple, proven architecture (demux + FIFOs + arbiter)
- Minimal logic (mostly wiring and control)
- FWFT mode reduces latency
- Phase control cleanly separates external and internal events

**Potential Improvements**:
- Priority arbitration to prevent overflow
- Skip-empty optimization to reduce latency
- Gated HBM reads to save bandwidth
- Multi-port arbiter for higher drain rate

**Critical Parameters**:
- FIFO depth must accommodate worst-case burst (512-1024 entries)
- Arbiter must drain faster than fill rate (or FIFOs overflow)
- Round-robin period (16 cycles) limits drain rate

For complete understanding, see cross-referenced modules: `external_events_processor.md`, `internal_events_processor.md`, `hbm_processor.md`, and `spike_fifo_controller.md`.