# 4.3 Hardware Implementation of R-STDP This chapter explains how R-STDP is implemented in the FPGA hardware by extending the existing framework from Chapters 1-2. The key insight is **reusing existing logic**: eligibility traces are updated like neuron membrane potentials (but without spiking), and weight updates reuse the existing HBM write infrastructure. --- ## Memory Architecture: Adding HBM Region 4 for Eligibility Traces Recall from Chapter 1 that the existing system uses three HBM regions: - **Region 1**: Axon pointers (where each axon's synapses are located) - **Region 2**: Neuron pointers (where each neuron's output synapses are located) - **Region 3**: Synapse data (OpCode, Target address, Weight) For R-STDP, we add a fourth region: - **Region 4**: Eligibility traces (one value per synapse, stored like membrane potentials) ### Address Mapping: Region 3 ↔ Region 4 Each synapse in Region 3 has a corresponding eligibility trace in Region 4. The mapping uses simple address arithmetic: ``` Synapse address (Region 3) → Eligibility trace address (Region 4) ────────────────────────────────────────────────────────────────── 0x8000 → 0x8000 + REGION4_OFFSET 0x8001 → 0x8001 + REGION4_OFFSET 0x8002 → 0x8002 + REGION4_OFFSET ``` **REGION4_OFFSET** = 1000 (or approximately half the number of rows in Region 3) **Example:** - Synapse at HBM row `0x8001` in Region 3 - Its eligibility trace at row `0x8001 + 1000 = 0x8FA9` in Region 4 **Why this simple mapping?** - Easy to compute in hardware (just add/subtract offset) - Leverages existing HBM read/write logic - Reversible: `Region_3_addr = Region_4_addr - REGION4_OFFSET` ### Data Format Each eligibility trace is stored as a **36-bit signed fixed-point value**, identical in format to neuron membrane potentials: ``` [35:0] = Eligibility trace value - Starts at 0 - Increases when STDP event occurs (coincidence detection) - Decays over time (like membrane potential leak) - NO threshold comparison (doesn't spike) ``` This reuse of the membrane potential format means we can use the **same hardware logic** that updates neurons to also update eligibility traces. --- ## The Reward Register: Dopamine Signal A new 1-bit register stores the reward signal (dopamine modulation): **Reward Register:** - **Width**: 1 bit (binary on/off) - **Set by**: New command opcode (CMD_SET_REWARD = 0x0A) - **Broadcast to**: internal_events_processor during Phase 2 - **Purpose**: Gates whether weight updates occur **Simplified for initial implementation:** - Binary signal: 1 = reward on, 0 = reward off - Can be toggled per timestep or set once for entire simulation - Future enhancement: Multi-bit for graded reward levels --- ## Phase 2 Flow with R-STDP: Step-by-Step We extend the existing Phase 2 (Synaptic Processing) to include eligibility trace handling and weight updates. ### Step 1: Pointer Processing with Dual Address Computation Recall from Chapter 2 that Phase 2 begins with `pointer_fifo_controller` reading synapse pointers from the 16 ptrFIFOs and sending them to `hbm_processor`. **Extension:** When `hbm_processor` receives a synapse pointer, it now computes **two addresses**: 1. **Region 3 address** (existing): Where the synapse data is stored 2. **Region 4 address** (new): Where the eligibility trace is stored - Computed as: `Region_3_addr + REGION4_OFFSET` **New FIFO introduced: etFIFO (Eligibility Trace FIFO)** | Property | Value | |----------|-------| | **Width** | 23 bits (HBM address) | | **Depth** | 512 | | **Written by** | hbm_processor (when processing synapse pointers) | | **Read by** | internal_events_processor (during neuron updates) | | **Contains** | Region 4 addresses (eligibility trace locations) | **What happens:** - `hbm_processor` pops a synapse pointer from ptrFIFO - Extracts Region 3 address: `0x8001` - Computes Region 4 address: `0x8001 + 1000 = 0x8FA9` - Pushes `0x8FA9` to **etFIFO** - Issues HBM read for synapse data at Region 3 address `0x8001` (existing behavior) **Result:** The etFIFO now holds eligibility trace addresses in the **same order** as synapses are being processed. This synchronization is critical for the next step. --- ### Step 2: Neuron Update with Coincidence Detection As synapses are read from HBM Region 3, they flow to `internal_events_processor` for neuron updates (existing behavior from Chapter 2.2). **Extension:** While processing each synapse, the internal_events_processor now also: 1. **Pops the corresponding eligibility trace address from etFIFO** - Because addresses were pushed in the same order, this stays synchronized 2. **Performs the neuron update** (existing): - Read current membrane potential from URAM - Add synaptic weight: `V_new = V_old + weight` - Check threshold: `spike = (V_new >= threshold)` - Write updated potential to URAM 3. **Performs coincidence detection** (new): - Check: Did the post-synaptic neuron spike? - If YES: We have an STDP event (pre fired → caused input → post fired) - This is **coincidence detection**: pre and post activity temporally correlated 4. **Checks reward signal** (new): - Read the reward register (broadcast to all processing units) - Check: Is dopamine ON? 5. **Conditional weight update trigger** (new): - If (coincidence detected) AND (reward signal = 1): - Push the Region 4 address to **et2FIFO** **New FIFO introduced: et2FIFO (Eligibility Trace to Weight Update FIFO)** | Property | Value | |----------|-------| | **Width** | 23 bits (HBM Region 4 address) | | **Depth** | 512 | | **Written by** | internal_events_processor (when coincidence + reward detected) | | **Read by** | hbm_processor weight update logic | | **Contains** | Region 4 addresses of synapses that should have weights updated | **What this achieves:** - Only synapses with **recent STDP events** (recorded in eligibility traces) that also receive **reward** will trigger weight updates - The et2FIFO acts as a queue of "synapses to update" --- ### Step 3: Weight Updates via Eligibility Traces While Phase 2 is running (or immediately after), `hbm_processor` monitors the **et2FIFO**. When entries appear, it performs weight updates. **Process for each entry:** 1. **Pop Region 4 address from et2FIFO** - Example: `0x8FA9` (eligibility trace address) 2. **Read eligibility trace value from HBM Region 4** - Issue HBM read to address `0x8FA9` - Receive 36-bit eligibility trace value: `c(t)` 3. **Compute corresponding synapse address** - Reverse the mapping: `Region_3_addr = 0x8FA9 - 1000 = 0x8001` 4. **Read current synapse weight from HBM Region 3** - Issue HBM read to address `0x8001` - Receive 32-bit synapse data - Extract weight: `current_weight = synapse_data[15:0]` 5. **Compute new weight using R-STDP rule** - R-STDP: `Δw = R(t) × c(t)` - Since we're in this state, `R(t) = 1` (reward is on) - Therefore: `Δw = c(t)` (the eligibility trace value) - New weight: `w_new = w_old + c(t)` - Apply clamping to prevent overflow: `w_new ∈ [-32768, 32767]` 6. **Write updated weight to HBM Region 3** - Reconstruct synapse data: `{OpCode, Target, w_new}` - Issue HBM write to address `0x8001` - **This reuses the same HBM write logic** that's called when the host updates weights via `write_synapse` commands **Key insight:** The weight update path reuses existing infrastructure. We just need to: - Cut into the write_synapse function at the point where it has the Region 3 address and new weight value - Provide those values from our R-STDP computation instead of from the host --- ### Step 4: Eligibility Trace Maintenance Eligibility traces need to be updated to: 1. **Decay over time** (like membrane potential leak) 2. **Increase when STDP events occur** (coincidence detection) **Mathematical update:** $$\dot{c}(t) = -\frac{c}{\tau_c} + \delta_{\text{STDP event}}$$ **Implementation approach: Reuse neuron membrane potential logic** The key insight is that eligibility trace updates are **nearly identical** to neuron updates: - Both are 36-bit values stored in memory - Both decay exponentially (leak) - Both accumulate inputs - **Difference:** Eligibility traces don't spike (no threshold comparison) **Method:** - Read eligibility trace from Region 4: `c_old` - Apply decay: `c_new = c_old - (c_old >> leak_shift)` - This is the same right-shift leak used for neurons - `leak_shift` determines τ_c (longer decay than neurons) - If STDP event detected for this synapse: `c_new += STDP_INCREMENT` - Write back to Region 4: `c_new` **When to perform these updates:** - **Option A**: During Phase 2 (parallel with neuron updates) - **Option B**: Separate Phase 3 after neuron updates complete - **Option C**: Lazy update - only when accessed for weight updates - More efficient: don't read/write all eligibility traces every timestep - Only update the ones being used --- ## Complete Phase 2 Data Flow Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ PHASE 2 WITH R-STDP │ └─────────────────────────────────────────────────────────────┘ Step 1: Pointer Processing ───────────────────────── pointer_fifo_controller │ └──> ptrFIFO (pop synapse pointer) │ └──> hbm_processor │ ├──> Compute Region 3 address (synapse) ├──> Compute Region 4 address (eligibility trace) ├──> Push Region 4 addr to etFIFO ───┐ └──> Read synapse from HBM Region 3 │ │ Step 2: Neuron Update + Coincidence Detection │ ─────────────────────────────────────────── │ hbm_processor forwards synapse data │ │ │ └──> internal_events_processor │ │ │ ├──> Pop Region 4 addr from etFIFO <────────┘ ├──> Update neuron (URAM): │ V_new = V_old + weight ├──> Check threshold: spike? ├──> Coincidence detection: post fired? ├──> Check reward register: DA on? │ └──> If (coincidence AND reward): Push Region 4 addr to et2FIFO ───┐ │ Step 3: Weight Updates │ ────────────────────── │ hbm_processor (parallel state machine) │ │ │ ├──> Pop Region 4 addr from et2FIFO <──────────┘ ├──> Read eligibility trace from Region 4 ├──> Compute Region 3 addr (subtract offset) ├──> Read current weight from Region 3 ├──> Compute: w_new = w_old + et_value └──> Write w_new to Region 3 (reuse write_synapse logic) Step 4: Eligibility Trace Maintenance (parallel or separate) ──────────────────────────────────────────────────────────── For each synapse's eligibility trace: ├──> Read from Region 4 ├──> Apply decay: c_new = c_old - (c_old >> leak_shift) ├──> If STDP event: c_new += INCREMENT └──> Write back to Region 4 ``` --- ## Summary of New Components ### Memory Regions - **HBM Region 4**: Eligibility trace storage (one 36-bit value per synapse) ### FIFOs (introduced as needed in the flow) - **etFIFO**: Queues eligibility trace addresses during pointer processing - **et2FIFO**: Queues eligibility traces that should trigger weight updates ### Control Signals - **Reward Register**: 1-bit dopamine signal (set by new opcode CMD_SET_REWARD) - **exec_reward**: Broadcast signal from command_interpreter to internal_events_processor ### Reused Logic - **Neuron update pipeline** → Eligibility trace updates (disable threshold check) - **HBM write infrastructure** → Weight updates - **Phase 2 synapse processing** → Coincidence detection point --- ## Key Design Decisions 1. **Simple address mapping**: Region4 = Region3 + constant offset - Makes address computation trivial (one addition) - Easily reversible for going from eligibility trace back to synapse 2. **Binary reward signal**: On/off only (for now) - Simplifies initial implementation - Future: Could be multi-bit for graded reward 3. **Coincidence-based STDP**: Post fires in same timestep as pre input - Simplified from asymmetric STDP window - Good enough for initial learning demonstrations 4. **Reuse existing hardware**: Eligibility traces use neuron logic - Minimal new hardware required - Proven, tested infrastructure 5. **FIFO-based pipeline**: Maintains synchronization - etFIFO keeps addresses aligned with synapse processing - et2FIFO queues weight updates for parallel processing --- ## Testing the Implementation ### Simple Test Case **Network:** - Neuron A (pre-synaptic) → Neuron B (post-synaptic) - Single synapse with initial weight `W = 500` - Threshold = 2000 **Experiment sequence:** 1. **Timestep 0**: Fire A repeatedly without B firing - No coincidence → eligibility trace stays at 0 2. **Timestep 5**: Fire A enough to make B spike - B receives input, crosses threshold, spikes - Coincidence detected: pre (A) was active, post (B) fired - Eligibility trace increases (but reward is still OFF) 3. **Timesteps 6-10**: No activity - Eligibility trace decays: `c(t) × exp(-Δt/τ_c)` 4. **Timestep 11**: Turn on reward (set reward register = 1) 5. **Timestep 12**: Fire A, B spikes again - Coincidence detected - Reward is ON - → Weight update triggered! - Read eligibility trace from Region 4: `c(t) = 100` (example) - Compute: `W_new = 500 + 100 = 600` - Write updated weight to Region 3 **Validation:** - Read back synapse weight from HBM → should be 600 - On next input from A, B should spike with less input (stronger synapse) --- ## Potential Issues and Mitigations ### 1. FIFO Overflow **Issue**: etFIFO or et2FIFO fills up if too many synapses processed **Mitigation**: - Size FIFOs same as ptrFIFO (512 entries proven sufficient) - Add overflow detection flags - Backpressure: stall pointer processing if etFIFO full ### 2. HBM Bandwidth **Issue**: Additional reads/writes for Region 4 and weight updates **Impact analysis**: - Existing: ~200-300 HBM transactions per timestep - New: +1 read per synapse (eligibility trace), +2 reads + 1 write per weight update - For network with 1000 active synapses, 10% coincidence: +100 transactions **Mitigation**: - Weight updates run in parallel with ongoing Phase 2 - Burst mode for sequential eligibility trace reads - Lazy decay (only update accessed traces) ### 3. Address Mapping Collision **Issue**: Region 4 might overlap with Region 3 if offset too small **Solution**: - Choose REGION4_OFFSET > maximum Region 3 size - Example: If Region 3 uses rows 0x8000-0xFFFF (32K rows) - Set REGION4_OFFSET = 0x8000 (32768 in decimal) - Region 4 spans: 0x10000-0x17FFF - No overlap ✓ ### 4. Coincidence Detection Granularity **Issue**: Current design: "post fired this timestep" = coincidence **Limitation**: Doesn't distinguish if post fired 1ms or 10ms after pre **Future enhancement**: - Track spike times with sub-timestep resolution - Implement asymmetric STDP window (separate τ+ and τ-) - Different eligibility trace increments for potentiation vs depression ### 5. Eligibility Trace Decay Timing **Issue**: When to apply decay? Every timestep for all traces is expensive **Recommended approach**: Lazy decay - Store last_update_timestamp with each trace - When reading for weight update: - Calculate time elapsed: `Δt = current_time - last_update_time` - Apply accumulated decay: `c = c_stored × exp(-Δt/τ_c)` - Only write back when updating (not every read) --- ## Python API Usage