4.3 Hardware Implementation of R-STDP#

This chapter explains how R-STDP is implemented in the FPGA hardware by extending the existing framework from Chapters 1-2. The key insight is reusing existing logic: eligibility traces are updated like neuron membrane potentials (but without spiking), and weight updates reuse the existing HBM write infrastructure.

Memory Architecture: Adding HBM Region 4 for Eligibility Traces#

Recall from Chapter 1 that the existing system uses three HBM regions:

Region 1: Axon pointers (where each axon’s synapses are located)
Region 2: Neuron pointers (where each neuron’s output synapses are located)
Region 3: Synapse data (OpCode, Target address, Weight)

For R-STDP, we add a fourth region:

Region 4: Eligibility traces (one value per synapse, stored like membrane potentials)

Address Mapping: Region 3 ↔ Region 4#

Each synapse in Region 3 has a corresponding eligibility trace in Region 4. The mapping uses simple address arithmetic:

Synapse address (Region 3)  →  Eligibility trace address (Region 4)
──────────────────────────────────────────────────────────────────
0x8000                      →  0x8000 + REGION4_OFFSET
0x8001                      →  0x8001 + REGION4_OFFSET
0x8002                      →  0x8002 + REGION4_OFFSET

REGION4_OFFSET = 1000 (or approximately half the number of rows in Region 3)

Example:

Synapse at HBM row 0x8001 in Region 3
Its eligibility trace at row 0x8001 + 1000 = 0x8FA9 in Region 4

Why this simple mapping?

Easy to compute in hardware (just add/subtract offset)
Leverages existing HBM read/write logic
Reversible: Region_3_addr = Region_4_addr - REGION4_OFFSET

Data Format#

Each eligibility trace is stored as a 36-bit signed fixed-point value, identical in format to neuron membrane potentials:

[35:0] = Eligibility trace value
  - Starts at 0
  - Increases when STDP event occurs (coincidence detection)
  - Decays over time (like membrane potential leak)
  - NO threshold comparison (doesn't spike)

This reuse of the membrane potential format means we can use the same hardware logic that updates neurons to also update eligibility traces.

The Reward Register: Dopamine Signal#

A new 1-bit register stores the reward signal (dopamine modulation):

Reward Register:

Width: 1 bit (binary on/off)
Set by: New command opcode (CMD_SET_REWARD = 0x0A)
Broadcast to: internal_events_processor during Phase 2
Purpose: Gates whether weight updates occur

Simplified for initial implementation:

Binary signal: 1 = reward on, 0 = reward off
Can be toggled per timestep or set once for entire simulation
Future enhancement: Multi-bit for graded reward levels

Phase 2 Flow with R-STDP: Step-by-Step#

We extend the existing Phase 2 (Synaptic Processing) to include eligibility trace handling and weight updates.

Step 1: Pointer Processing with Dual Address Computation#

Recall from Chapter 2 that Phase 2 begins with pointer_fifo_controller reading synapse pointers from the 16 ptrFIFOs and sending them to hbm_processor.

Extension: When hbm_processor receives a synapse pointer, it now computes two addresses:

Region 3 address (existing): Where the synapse data is stored
Region 4 address (new): Where the eligibility trace is stored
- Computed as: Region_3_addr + REGION4_OFFSET

New FIFO introduced: etFIFO (Eligibility Trace FIFO)

Property	Value
Width	23 bits (HBM address)
Depth	512
Written by	hbm_processor (when processing synapse pointers)
Read by	internal_events_processor (during neuron updates)
Contains	Region 4 addresses (eligibility trace locations)

What happens:

hbm_processor pops a synapse pointer from ptrFIFO
Extracts Region 3 address: 0x8001
Computes Region 4 address: 0x8001 + 1000 = 0x8FA9
Pushes 0x8FA9 to etFIFO
Issues HBM read for synapse data at Region 3 address 0x8001 (existing behavior)

Result: The etFIFO now holds eligibility trace addresses in the same order as synapses are being processed. This synchronization is critical for the next step.

Step 2: Neuron Update with Coincidence Detection#

As synapses are read from HBM Region 3, they flow to internal_events_processor for neuron updates (existing behavior from Chapter 2.2).

Extension: While processing each synapse, the internal_events_processor now also:

Pops the corresponding eligibility trace address from etFIFO
- Because addresses were pushed in the same order, this stays synchronized
Performs the neuron update (existing):
- Read current membrane potential from URAM
- Add synaptic weight: V_new = V_old + weight
- Check threshold: spike = (V_new >= threshold)
- Write updated potential to URAM
Performs coincidence detection (new):
- Check: Did the post-synaptic neuron spike?
- If YES: We have an STDP event (pre fired → caused input → post fired)
- This is coincidence detection: pre and post activity temporally correlated
Checks reward signal (new):
- Read the reward register (broadcast to all processing units)
- Check: Is dopamine ON?
Conditional weight update trigger (new):
- If (coincidence detected) AND (reward signal = 1):
  - Push the Region 4 address to et2FIFO

New FIFO introduced: et2FIFO (Eligibility Trace to Weight Update FIFO)

Property	Value
Width	23 bits (HBM Region 4 address)
Depth	512
Written by	internal_events_processor (when coincidence + reward detected)
Read by	hbm_processor weight update logic
Contains	Region 4 addresses of synapses that should have weights updated

What this achieves:

Only synapses with recent STDP events (recorded in eligibility traces) that also receive reward will trigger weight updates
The et2FIFO acts as a queue of “synapses to update”

Step 3: Weight Updates via Eligibility Traces#

While Phase 2 is running (or immediately after), hbm_processor monitors the et2FIFO. When entries appear, it performs weight updates.

Process for each entry:

Pop Region 4 address from et2FIFO
- Example: 0x8FA9 (eligibility trace address)
Read eligibility trace value from HBM Region 4
- Issue HBM read to address 0x8FA9
- Receive 36-bit eligibility trace value: c(t)
Compute corresponding synapse address
- Reverse the mapping: Region_3_addr = 0x8FA9 - 1000 = 0x8001
Read current synapse weight from HBM Region 3
- Issue HBM read to address 0x8001
- Receive 32-bit synapse data
- Extract weight: current_weight = synapse_data[15:0]
Compute new weight using R-STDP rule
- R-STDP: Δw = R(t) × c(t)
- Since we’re in this state, R(t) = 1 (reward is on)
- Therefore: Δw = c(t) (the eligibility trace value)
- New weight: w_new = w_old + c(t)
- Apply clamping to prevent overflow: w_new ∈ [-32768, 32767]
Write updated weight to HBM Region 3
- Reconstruct synapse data: {OpCode, Target, w_new}
- Issue HBM write to address 0x8001
- This reuses the same HBM write logic that’s called when the host updates weights via write_synapse commands

Key insight: The weight update path reuses existing infrastructure. We just need to:

Cut into the write_synapse function at the point where it has the Region 3 address and new weight value
Provide those values from our R-STDP computation instead of from the host

Step 4: Eligibility Trace Maintenance#

Eligibility traces need to be updated to:

Decay over time (like membrane potential leak)
Increase when STDP events occur (coincidence detection)

Mathematical update:

\[\dot{c}(t) = -\frac{c}{\tau_c} + \delta_{\text{STDP event}}\]

Implementation approach: Reuse neuron membrane potential logic

The key insight is that eligibility trace updates are nearly identical to neuron updates:

Both are 36-bit values stored in memory
Both decay exponentially (leak)
Both accumulate inputs
Difference: Eligibility traces don’t spike (no threshold comparison)

Method:

Read eligibility trace from Region 4: c_old
Apply decay: c_new = c_old - (c_old >> leak_shift)
- This is the same right-shift leak used for neurons
- leak_shift determines τ_c (longer decay than neurons)
If STDP event detected for this synapse: c_new += STDP_INCREMENT
Write back to Region 4: c_new

When to perform these updates:

Option A: During Phase 2 (parallel with neuron updates)
Option B: Separate Phase 3 after neuron updates complete
Option C: Lazy update - only when accessed for weight updates
- More efficient: don’t read/write all eligibility traces every timestep
- Only update the ones being used

Complete Phase 2 Data Flow Diagram#

┌─────────────────────────────────────────────────────────────┐
│                    PHASE 2 WITH R-STDP                      │
└─────────────────────────────────────────────────────────────┘

Step 1: Pointer Processing
─────────────────────────
pointer_fifo_controller
    │
    └──> ptrFIFO (pop synapse pointer)
           │
           └──> hbm_processor
                  │
                  ├──> Compute Region 3 address (synapse)
                  ├──> Compute Region 4 address (eligibility trace)
                  ├──> Push Region 4 addr to etFIFO ───┐
                  └──> Read synapse from HBM Region 3  │
                                                        │
Step 2: Neuron Update + Coincidence Detection          │
───────────────────────────────────────────            │
hbm_processor forwards synapse data                    │
    │                                                   │
    └──> internal_events_processor                     │
           │                                            │
           ├──> Pop Region 4 addr from etFIFO <────────┘
           ├──> Update neuron (URAM):
           │      V_new = V_old + weight
           ├──> Check threshold: spike?
           ├──> Coincidence detection: post fired?
           ├──> Check reward register: DA on?
           │
           └──> If (coincidence AND reward):
                  Push Region 4 addr to et2FIFO ───┐
                                                    │
Step 3: Weight Updates                              │
──────────────────────                              │
hbm_processor (parallel state machine)              │
    │                                                │
    ├──> Pop Region 4 addr from et2FIFO <──────────┘
    ├──> Read eligibility trace from Region 4
    ├──> Compute Region 3 addr (subtract offset)
    ├──> Read current weight from Region 3
    ├──> Compute: w_new = w_old + et_value
    └──> Write w_new to Region 3 (reuse write_synapse logic)

Step 4: Eligibility Trace Maintenance (parallel or separate)
────────────────────────────────────────────────────────────
For each synapse's eligibility trace:
    ├──> Read from Region 4
    ├──> Apply decay: c_new = c_old - (c_old >> leak_shift)
    ├──> If STDP event: c_new += INCREMENT
    └──> Write back to Region 4

Summary of New Components#

Memory Regions#

HBM Region 4: Eligibility trace storage (one 36-bit value per synapse)

FIFOs (introduced as needed in the flow)#

etFIFO: Queues eligibility trace addresses during pointer processing
et2FIFO: Queues eligibility traces that should trigger weight updates

Control Signals#

Reward Register: 1-bit dopamine signal (set by new opcode CMD_SET_REWARD)
exec_reward: Broadcast signal from command_interpreter to internal_events_processor

Reused Logic#

Neuron update pipeline → Eligibility trace updates (disable threshold check)
HBM write infrastructure → Weight updates
Phase 2 synapse processing → Coincidence detection point

Key Design Decisions#

Simple address mapping: Region4 = Region3 + constant offset
- Makes address computation trivial (one addition)
- Easily reversible for going from eligibility trace back to synapse
Binary reward signal: On/off only (for now)
- Simplifies initial implementation
- Future: Could be multi-bit for graded reward
Coincidence-based STDP: Post fires in same timestep as pre input
- Simplified from asymmetric STDP window
- Good enough for initial learning demonstrations
Reuse existing hardware: Eligibility traces use neuron logic
- Minimal new hardware required
- Proven, tested infrastructure
FIFO-based pipeline: Maintains synchronization
- etFIFO keeps addresses aligned with synapse processing
- et2FIFO queues weight updates for parallel processing

Testing the Implementation#

Simple Test Case#

Network:

Neuron A (pre-synaptic) → Neuron B (post-synaptic)
Single synapse with initial weight W = 500
Threshold = 2000

Experiment sequence:

Timestep 0: Fire A repeatedly without B firing
- No coincidence → eligibility trace stays at 0
Timestep 5: Fire A enough to make B spike
- B receives input, crosses threshold, spikes
- Coincidence detected: pre (A) was active, post (B) fired
- Eligibility trace increases (but reward is still OFF)
Timesteps 6-10: No activity
- Eligibility trace decays: c(t) × exp(-Δt/τ_c)
Timestep 11: Turn on reward (set reward register = 1)
Timestep 12: Fire A, B spikes again
- Coincidence detected
- Reward is ON
- → Weight update triggered!
- Read eligibility trace from Region 4: c(t) = 100 (example)
- Compute: W_new = 500 + 100 = 600
- Write updated weight to Region 3

Validation:

Read back synapse weight from HBM → should be 600
On next input from A, B should spike with less input (stronger synapse)

Potential Issues and Mitigations#

1. FIFO Overflow#

Issue: etFIFO or et2FIFO fills up if too many synapses processed

Mitigation:

Size FIFOs same as ptrFIFO (512 entries proven sufficient)
Add overflow detection flags
Backpressure: stall pointer processing if etFIFO full

2. HBM Bandwidth#

Issue: Additional reads/writes for Region 4 and weight updates

Impact analysis:

Existing: ~200-300 HBM transactions per timestep
New: +1 read per synapse (eligibility trace), +2 reads + 1 write per weight update
For network with 1000 active synapses, 10% coincidence: +100 transactions

Mitigation:

Weight updates run in parallel with ongoing Phase 2
Burst mode for sequential eligibility trace reads
Lazy decay (only update accessed traces)

3. Address Mapping Collision#

Issue: Region 4 might overlap with Region 3 if offset too small

Solution:

Choose REGION4_OFFSET > maximum Region 3 size
Example: If Region 3 uses rows 0x8000-0xFFFF (32K rows)
- Set REGION4_OFFSET = 0x8000 (32768 in decimal)
- Region 4 spans: 0x10000-0x17FFF
- No overlap ✓

4. Coincidence Detection Granularity#

Issue: Current design: “post fired this timestep” = coincidence

Limitation: Doesn’t distinguish if post fired 1ms or 10ms after pre

Future enhancement:

Track spike times with sub-timestep resolution
Implement asymmetric STDP window (separate τ+ and τ-)
Different eligibility trace increments for potentiation vs depression

5. Eligibility Trace Decay Timing#

Issue: When to apply decay? Every timestep for all traces is expensive

Recommended approach: Lazy decay

Store last_update_timestamp with each trace
When reading for weight update:
- Calculate time elapsed: Δt = current_time - last_update_time
- Apply accumulated decay: c = c_stored × exp(-Δt/τ_c)
Only write back when updating (not every read)