4.3 Hardware Implementation of R-STDP#
This chapter explains how R-STDP is implemented in the FPGA hardware by extending the existing framework from Chapters 1-2. The key insight is reusing existing logic: eligibility traces are updated like neuron membrane potentials (but without spiking), and weight updates reuse the existing HBM write infrastructure.
Memory Architecture: Adding HBM Region 4 for Eligibility Traces#
Recall from Chapter 1 that the existing system uses three HBM regions:
Region 1: Axon pointers (where each axon’s synapses are located)
Region 2: Neuron pointers (where each neuron’s output synapses are located)
Region 3: Synapse data (OpCode, Target address, Weight)
For R-STDP, we add a fourth region:
Region 4: Eligibility traces (one value per synapse, stored like membrane potentials)
Address Mapping: Region 3 ↔ Region 4#
Each synapse in Region 3 has a corresponding eligibility trace in Region 4. The mapping uses simple address arithmetic:
Synapse address (Region 3) → Eligibility trace address (Region 4)
──────────────────────────────────────────────────────────────────
0x8000 → 0x8000 + REGION4_OFFSET
0x8001 → 0x8001 + REGION4_OFFSET
0x8002 → 0x8002 + REGION4_OFFSET
REGION4_OFFSET = 1000 (or approximately half the number of rows in Region 3)
Example:
Synapse at HBM row
0x8001in Region 3Its eligibility trace at row
0x8001 + 1000 = 0x8FA9in Region 4
Why this simple mapping?
Easy to compute in hardware (just add/subtract offset)
Leverages existing HBM read/write logic
Reversible:
Region_3_addr = Region_4_addr - REGION4_OFFSET
Data Format#
Each eligibility trace is stored as a 36-bit signed fixed-point value, identical in format to neuron membrane potentials:
[35:0] = Eligibility trace value
- Starts at 0
- Increases when STDP event occurs (coincidence detection)
- Decays over time (like membrane potential leak)
- NO threshold comparison (doesn't spike)
This reuse of the membrane potential format means we can use the same hardware logic that updates neurons to also update eligibility traces.
The Reward Register: Dopamine Signal#
A new 1-bit register stores the reward signal (dopamine modulation):
Reward Register:
Width: 1 bit (binary on/off)
Set by: New command opcode (CMD_SET_REWARD = 0x0A)
Broadcast to: internal_events_processor during Phase 2
Purpose: Gates whether weight updates occur
Simplified for initial implementation:
Binary signal: 1 = reward on, 0 = reward off
Can be toggled per timestep or set once for entire simulation
Future enhancement: Multi-bit for graded reward levels
Phase 2 Flow with R-STDP: Step-by-Step#
We extend the existing Phase 2 (Synaptic Processing) to include eligibility trace handling and weight updates.
Step 1: Pointer Processing with Dual Address Computation#
Recall from Chapter 2 that Phase 2 begins with pointer_fifo_controller reading synapse pointers from the 16 ptrFIFOs and sending them to hbm_processor.
Extension: When hbm_processor receives a synapse pointer, it now computes two addresses:
Region 3 address (existing): Where the synapse data is stored
Region 4 address (new): Where the eligibility trace is stored
Computed as:
Region_3_addr + REGION4_OFFSET
New FIFO introduced: etFIFO (Eligibility Trace FIFO)
Property |
Value |
|---|---|
Width |
23 bits (HBM address) |
Depth |
512 |
Written by |
hbm_processor (when processing synapse pointers) |
Read by |
internal_events_processor (during neuron updates) |
Contains |
Region 4 addresses (eligibility trace locations) |
What happens:
hbm_processorpops a synapse pointer from ptrFIFOExtracts Region 3 address:
0x8001Computes Region 4 address:
0x8001 + 1000 = 0x8FA9Pushes
0x8FA9to etFIFOIssues HBM read for synapse data at Region 3 address
0x8001(existing behavior)
Result: The etFIFO now holds eligibility trace addresses in the same order as synapses are being processed. This synchronization is critical for the next step.
Step 2: Neuron Update with Coincidence Detection#
As synapses are read from HBM Region 3, they flow to internal_events_processor for neuron updates (existing behavior from Chapter 2.2).
Extension: While processing each synapse, the internal_events_processor now also:
Pops the corresponding eligibility trace address from etFIFO
Because addresses were pushed in the same order, this stays synchronized
Performs the neuron update (existing):
Read current membrane potential from URAM
Add synaptic weight:
V_new = V_old + weightCheck threshold:
spike = (V_new >= threshold)Write updated potential to URAM
Performs coincidence detection (new):
Check: Did the post-synaptic neuron spike?
If YES: We have an STDP event (pre fired → caused input → post fired)
This is coincidence detection: pre and post activity temporally correlated
Checks reward signal (new):
Read the reward register (broadcast to all processing units)
Check: Is dopamine ON?
Conditional weight update trigger (new):
If (coincidence detected) AND (reward signal = 1):
Push the Region 4 address to et2FIFO
New FIFO introduced: et2FIFO (Eligibility Trace to Weight Update FIFO)
Property |
Value |
|---|---|
Width |
23 bits (HBM Region 4 address) |
Depth |
512 |
Written by |
internal_events_processor (when coincidence + reward detected) |
Read by |
hbm_processor weight update logic |
Contains |
Region 4 addresses of synapses that should have weights updated |
What this achieves:
Only synapses with recent STDP events (recorded in eligibility traces) that also receive reward will trigger weight updates
The et2FIFO acts as a queue of “synapses to update”
Step 3: Weight Updates via Eligibility Traces#
While Phase 2 is running (or immediately after), hbm_processor monitors the et2FIFO. When entries appear, it performs weight updates.
Process for each entry:
Pop Region 4 address from et2FIFO
Example:
0x8FA9(eligibility trace address)
Read eligibility trace value from HBM Region 4
Issue HBM read to address
0x8FA9Receive 36-bit eligibility trace value:
c(t)
Compute corresponding synapse address
Reverse the mapping:
Region_3_addr = 0x8FA9 - 1000 = 0x8001
Read current synapse weight from HBM Region 3
Issue HBM read to address
0x8001Receive 32-bit synapse data
Extract weight:
current_weight = synapse_data[15:0]
Compute new weight using R-STDP rule
R-STDP:
Δw = R(t) × c(t)Since we’re in this state,
R(t) = 1(reward is on)Therefore:
Δw = c(t)(the eligibility trace value)New weight:
w_new = w_old + c(t)Apply clamping to prevent overflow:
w_new ∈ [-32768, 32767]
Write updated weight to HBM Region 3
Reconstruct synapse data:
{OpCode, Target, w_new}Issue HBM write to address
0x8001This reuses the same HBM write logic that’s called when the host updates weights via
write_synapsecommands
Key insight: The weight update path reuses existing infrastructure. We just need to:
Cut into the write_synapse function at the point where it has the Region 3 address and new weight value
Provide those values from our R-STDP computation instead of from the host
Step 4: Eligibility Trace Maintenance#
Eligibility traces need to be updated to:
Decay over time (like membrane potential leak)
Increase when STDP events occur (coincidence detection)
Mathematical update:
Implementation approach: Reuse neuron membrane potential logic
The key insight is that eligibility trace updates are nearly identical to neuron updates:
Both are 36-bit values stored in memory
Both decay exponentially (leak)
Both accumulate inputs
Difference: Eligibility traces don’t spike (no threshold comparison)
Method:
Read eligibility trace from Region 4:
c_oldApply decay:
c_new = c_old - (c_old >> leak_shift)This is the same right-shift leak used for neurons
leak_shiftdetermines τ_c (longer decay than neurons)
If STDP event detected for this synapse:
c_new += STDP_INCREMENTWrite back to Region 4:
c_new
When to perform these updates:
Option A: During Phase 2 (parallel with neuron updates)
Option B: Separate Phase 3 after neuron updates complete
Option C: Lazy update - only when accessed for weight updates
More efficient: don’t read/write all eligibility traces every timestep
Only update the ones being used
Complete Phase 2 Data Flow Diagram#
┌─────────────────────────────────────────────────────────────┐
│ PHASE 2 WITH R-STDP │
└─────────────────────────────────────────────────────────────┘
Step 1: Pointer Processing
─────────────────────────
pointer_fifo_controller
│
└──> ptrFIFO (pop synapse pointer)
│
└──> hbm_processor
│
├──> Compute Region 3 address (synapse)
├──> Compute Region 4 address (eligibility trace)
├──> Push Region 4 addr to etFIFO ───┐
└──> Read synapse from HBM Region 3 │
│
Step 2: Neuron Update + Coincidence Detection │
─────────────────────────────────────────── │
hbm_processor forwards synapse data │
│ │
└──> internal_events_processor │
│ │
├──> Pop Region 4 addr from etFIFO <────────┘
├──> Update neuron (URAM):
│ V_new = V_old + weight
├──> Check threshold: spike?
├──> Coincidence detection: post fired?
├──> Check reward register: DA on?
│
└──> If (coincidence AND reward):
Push Region 4 addr to et2FIFO ───┐
│
Step 3: Weight Updates │
────────────────────── │
hbm_processor (parallel state machine) │
│ │
├──> Pop Region 4 addr from et2FIFO <──────────┘
├──> Read eligibility trace from Region 4
├──> Compute Region 3 addr (subtract offset)
├──> Read current weight from Region 3
├──> Compute: w_new = w_old + et_value
└──> Write w_new to Region 3 (reuse write_synapse logic)
Step 4: Eligibility Trace Maintenance (parallel or separate)
────────────────────────────────────────────────────────────
For each synapse's eligibility trace:
├──> Read from Region 4
├──> Apply decay: c_new = c_old - (c_old >> leak_shift)
├──> If STDP event: c_new += INCREMENT
└──> Write back to Region 4
Summary of New Components#
Memory Regions#
HBM Region 4: Eligibility trace storage (one 36-bit value per synapse)
FIFOs (introduced as needed in the flow)#
etFIFO: Queues eligibility trace addresses during pointer processing
et2FIFO: Queues eligibility traces that should trigger weight updates
Control Signals#
Reward Register: 1-bit dopamine signal (set by new opcode CMD_SET_REWARD)
exec_reward: Broadcast signal from command_interpreter to internal_events_processor
Reused Logic#
Neuron update pipeline → Eligibility trace updates (disable threshold check)
HBM write infrastructure → Weight updates
Phase 2 synapse processing → Coincidence detection point
Key Design Decisions#
Simple address mapping: Region4 = Region3 + constant offset
Makes address computation trivial (one addition)
Easily reversible for going from eligibility trace back to synapse
Binary reward signal: On/off only (for now)
Simplifies initial implementation
Future: Could be multi-bit for graded reward
Coincidence-based STDP: Post fires in same timestep as pre input
Simplified from asymmetric STDP window
Good enough for initial learning demonstrations
Reuse existing hardware: Eligibility traces use neuron logic
Minimal new hardware required
Proven, tested infrastructure
FIFO-based pipeline: Maintains synchronization
etFIFO keeps addresses aligned with synapse processing
et2FIFO queues weight updates for parallel processing
Testing the Implementation#
Simple Test Case#
Network:
Neuron A (pre-synaptic) → Neuron B (post-synaptic)
Single synapse with initial weight
W = 500Threshold = 2000
Experiment sequence:
Timestep 0: Fire A repeatedly without B firing
No coincidence → eligibility trace stays at 0
Timestep 5: Fire A enough to make B spike
B receives input, crosses threshold, spikes
Coincidence detected: pre (A) was active, post (B) fired
Eligibility trace increases (but reward is still OFF)
Timesteps 6-10: No activity
Eligibility trace decays:
c(t) × exp(-Δt/τ_c)
Timestep 11: Turn on reward (set reward register = 1)
Timestep 12: Fire A, B spikes again
Coincidence detected
Reward is ON
→ Weight update triggered!
Read eligibility trace from Region 4:
c(t) = 100(example)Compute:
W_new = 500 + 100 = 600Write updated weight to Region 3
Validation:
Read back synapse weight from HBM → should be 600
On next input from A, B should spike with less input (stronger synapse)
Potential Issues and Mitigations#
1. FIFO Overflow#
Issue: etFIFO or et2FIFO fills up if too many synapses processed
Mitigation:
Size FIFOs same as ptrFIFO (512 entries proven sufficient)
Add overflow detection flags
Backpressure: stall pointer processing if etFIFO full
2. HBM Bandwidth#
Issue: Additional reads/writes for Region 4 and weight updates
Impact analysis:
Existing: ~200-300 HBM transactions per timestep
New: +1 read per synapse (eligibility trace), +2 reads + 1 write per weight update
For network with 1000 active synapses, 10% coincidence: +100 transactions
Mitigation:
Weight updates run in parallel with ongoing Phase 2
Burst mode for sequential eligibility trace reads
Lazy decay (only update accessed traces)
3. Address Mapping Collision#
Issue: Region 4 might overlap with Region 3 if offset too small
Solution:
Choose REGION4_OFFSET > maximum Region 3 size
Example: If Region 3 uses rows 0x8000-0xFFFF (32K rows)
Set REGION4_OFFSET = 0x8000 (32768 in decimal)
Region 4 spans: 0x10000-0x17FFF
No overlap ✓
4. Coincidence Detection Granularity#
Issue: Current design: “post fired this timestep” = coincidence
Limitation: Doesn’t distinguish if post fired 1ms or 10ms after pre
Future enhancement:
Track spike times with sub-timestep resolution
Implement asymmetric STDP window (separate τ+ and τ-)
Different eligibility trace increments for potentiation vs depression
5. Eligibility Trace Decay Timing#
Issue: When to apply decay? Every timestep for all traces is expensive
Recommended approach: Lazy decay
Store last_update_timestamp with each trace
When reading for weight update:
Calculate time elapsed:
Δt = current_time - last_update_timeApply accumulated decay:
c = c_stored × exp(-Δt/τ_c)
Only write back when updating (not every read)