4.3 Hardware Implementation of R-STDP#

This chapter explains how R-STDP is implemented in the FPGA hardware by extending the existing framework from Chapters 1-2. The key insight is reusing existing logic: eligibility traces are updated like neuron membrane potentials (but without spiking), and weight updates reuse the existing HBM write infrastructure.


Memory Architecture: Adding HBM Region 4 for Eligibility Traces#

Recall from Chapter 1 that the existing system uses three HBM regions:

  • Region 1: Axon pointers (where each axon’s synapses are located)

  • Region 2: Neuron pointers (where each neuron’s output synapses are located)

  • Region 3: Synapse data (OpCode, Target address, Weight)

For R-STDP, we add a fourth region:

  • Region 4: Eligibility traces (one value per synapse, stored like membrane potentials)

Address Mapping: Region 3 ↔ Region 4#

Each synapse in Region 3 has a corresponding eligibility trace in Region 4. The mapping uses simple address arithmetic:

Synapse address (Region 3)  →  Eligibility trace address (Region 4)
──────────────────────────────────────────────────────────────────
0x8000                      →  0x8000 + REGION4_OFFSET
0x8001                      →  0x8001 + REGION4_OFFSET
0x8002                      →  0x8002 + REGION4_OFFSET

REGION4_OFFSET = 1000 (or approximately half the number of rows in Region 3)

Example:

  • Synapse at HBM row 0x8001 in Region 3

  • Its eligibility trace at row 0x8001 + 1000 = 0x8FA9 in Region 4

Why this simple mapping?

  • Easy to compute in hardware (just add/subtract offset)

  • Leverages existing HBM read/write logic

  • Reversible: Region_3_addr = Region_4_addr - REGION4_OFFSET

Data Format#

Each eligibility trace is stored as a 36-bit signed fixed-point value, identical in format to neuron membrane potentials:

[35:0] = Eligibility trace value
  - Starts at 0
  - Increases when STDP event occurs (coincidence detection)
  - Decays over time (like membrane potential leak)
  - NO threshold comparison (doesn't spike)

This reuse of the membrane potential format means we can use the same hardware logic that updates neurons to also update eligibility traces.


The Reward Register: Dopamine Signal#

A new 1-bit register stores the reward signal (dopamine modulation):

Reward Register:

  • Width: 1 bit (binary on/off)

  • Set by: New command opcode (CMD_SET_REWARD = 0x0A)

  • Broadcast to: internal_events_processor during Phase 2

  • Purpose: Gates whether weight updates occur

Simplified for initial implementation:

  • Binary signal: 1 = reward on, 0 = reward off

  • Can be toggled per timestep or set once for entire simulation

  • Future enhancement: Multi-bit for graded reward levels


Phase 2 Flow with R-STDP: Step-by-Step#

We extend the existing Phase 2 (Synaptic Processing) to include eligibility trace handling and weight updates.

Step 1: Pointer Processing with Dual Address Computation#

Recall from Chapter 2 that Phase 2 begins with pointer_fifo_controller reading synapse pointers from the 16 ptrFIFOs and sending them to hbm_processor.

Extension: When hbm_processor receives a synapse pointer, it now computes two addresses:

  1. Region 3 address (existing): Where the synapse data is stored

  2. Region 4 address (new): Where the eligibility trace is stored

    • Computed as: Region_3_addr + REGION4_OFFSET

New FIFO introduced: etFIFO (Eligibility Trace FIFO)

Property

Value

Width

23 bits (HBM address)

Depth

512

Written by

hbm_processor (when processing synapse pointers)

Read by

internal_events_processor (during neuron updates)

Contains

Region 4 addresses (eligibility trace locations)

What happens:

  • hbm_processor pops a synapse pointer from ptrFIFO

  • Extracts Region 3 address: 0x8001

  • Computes Region 4 address: 0x8001 + 1000 = 0x8FA9

  • Pushes 0x8FA9 to etFIFO

  • Issues HBM read for synapse data at Region 3 address 0x8001 (existing behavior)

Result: The etFIFO now holds eligibility trace addresses in the same order as synapses are being processed. This synchronization is critical for the next step.


Step 2: Neuron Update with Coincidence Detection#

As synapses are read from HBM Region 3, they flow to internal_events_processor for neuron updates (existing behavior from Chapter 2.2).

Extension: While processing each synapse, the internal_events_processor now also:

  1. Pops the corresponding eligibility trace address from etFIFO

    • Because addresses were pushed in the same order, this stays synchronized

  2. Performs the neuron update (existing):

    • Read current membrane potential from URAM

    • Add synaptic weight: V_new = V_old + weight

    • Check threshold: spike = (V_new >= threshold)

    • Write updated potential to URAM

  3. Performs coincidence detection (new):

    • Check: Did the post-synaptic neuron spike?

    • If YES: We have an STDP event (pre fired → caused input → post fired)

    • This is coincidence detection: pre and post activity temporally correlated

  4. Checks reward signal (new):

    • Read the reward register (broadcast to all processing units)

    • Check: Is dopamine ON?

  5. Conditional weight update trigger (new):

    • If (coincidence detected) AND (reward signal = 1):

      • Push the Region 4 address to et2FIFO

New FIFO introduced: et2FIFO (Eligibility Trace to Weight Update FIFO)

Property

Value

Width

23 bits (HBM Region 4 address)

Depth

512

Written by

internal_events_processor (when coincidence + reward detected)

Read by

hbm_processor weight update logic

Contains

Region 4 addresses of synapses that should have weights updated

What this achieves:

  • Only synapses with recent STDP events (recorded in eligibility traces) that also receive reward will trigger weight updates

  • The et2FIFO acts as a queue of “synapses to update”


Step 3: Weight Updates via Eligibility Traces#

While Phase 2 is running (or immediately after), hbm_processor monitors the et2FIFO. When entries appear, it performs weight updates.

Process for each entry:

  1. Pop Region 4 address from et2FIFO

    • Example: 0x8FA9 (eligibility trace address)

  2. Read eligibility trace value from HBM Region 4

    • Issue HBM read to address 0x8FA9

    • Receive 36-bit eligibility trace value: c(t)

  3. Compute corresponding synapse address

    • Reverse the mapping: Region_3_addr = 0x8FA9 - 1000 = 0x8001

  4. Read current synapse weight from HBM Region 3

    • Issue HBM read to address 0x8001

    • Receive 32-bit synapse data

    • Extract weight: current_weight = synapse_data[15:0]

  5. Compute new weight using R-STDP rule

    • R-STDP: Δw = R(t) × c(t)

    • Since we’re in this state, R(t) = 1 (reward is on)

    • Therefore: Δw = c(t) (the eligibility trace value)

    • New weight: w_new = w_old + c(t)

    • Apply clamping to prevent overflow: w_new [-32768, 32767]

  6. Write updated weight to HBM Region 3

    • Reconstruct synapse data: {OpCode, Target, w_new}

    • Issue HBM write to address 0x8001

    • This reuses the same HBM write logic that’s called when the host updates weights via write_synapse commands

Key insight: The weight update path reuses existing infrastructure. We just need to:

  • Cut into the write_synapse function at the point where it has the Region 3 address and new weight value

  • Provide those values from our R-STDP computation instead of from the host


Step 4: Eligibility Trace Maintenance#

Eligibility traces need to be updated to:

  1. Decay over time (like membrane potential leak)

  2. Increase when STDP events occur (coincidence detection)

Mathematical update:

\[\dot{c}(t) = -\frac{c}{\tau_c} + \delta_{\text{STDP event}}\]

Implementation approach: Reuse neuron membrane potential logic

The key insight is that eligibility trace updates are nearly identical to neuron updates:

  • Both are 36-bit values stored in memory

  • Both decay exponentially (leak)

  • Both accumulate inputs

  • Difference: Eligibility traces don’t spike (no threshold comparison)

Method:

  • Read eligibility trace from Region 4: c_old

  • Apply decay: c_new = c_old - (c_old >> leak_shift)

    • This is the same right-shift leak used for neurons

    • leak_shift determines τ_c (longer decay than neurons)

  • If STDP event detected for this synapse: c_new += STDP_INCREMENT

  • Write back to Region 4: c_new

When to perform these updates:

  • Option A: During Phase 2 (parallel with neuron updates)

  • Option B: Separate Phase 3 after neuron updates complete

  • Option C: Lazy update - only when accessed for weight updates

    • More efficient: don’t read/write all eligibility traces every timestep

    • Only update the ones being used


Complete Phase 2 Data Flow Diagram#

┌─────────────────────────────────────────────────────────────┐
│                    PHASE 2 WITH R-STDP                      │
└─────────────────────────────────────────────────────────────┘

Step 1: Pointer Processing
─────────────────────────
pointer_fifo_controller
    │
    └──> ptrFIFO (pop synapse pointer)
           │
           └──> hbm_processor
                  │
                  ├──> Compute Region 3 address (synapse)
                  ├──> Compute Region 4 address (eligibility trace)
                  ├──> Push Region 4 addr to etFIFO ───┐
                  └──> Read synapse from HBM Region 3  │
                                                        │
Step 2: Neuron Update + Coincidence Detection          │
───────────────────────────────────────────            │
hbm_processor forwards synapse data                    │
    │                                                   │
    └──> internal_events_processor                     │
           │                                            │
           ├──> Pop Region 4 addr from etFIFO <────────┘
           ├──> Update neuron (URAM):
           │      V_new = V_old + weight
           ├──> Check threshold: spike?
           ├──> Coincidence detection: post fired?
           ├──> Check reward register: DA on?
           │
           └──> If (coincidence AND reward):
                  Push Region 4 addr to et2FIFO ───┐
                                                    │
Step 3: Weight Updates                              │
──────────────────────                              │
hbm_processor (parallel state machine)              │
    │                                                │
    ├──> Pop Region 4 addr from et2FIFO <──────────┘
    ├──> Read eligibility trace from Region 4
    ├──> Compute Region 3 addr (subtract offset)
    ├──> Read current weight from Region 3
    ├──> Compute: w_new = w_old + et_value
    └──> Write w_new to Region 3 (reuse write_synapse logic)

Step 4: Eligibility Trace Maintenance (parallel or separate)
────────────────────────────────────────────────────────────
For each synapse's eligibility trace:
    ├──> Read from Region 4
    ├──> Apply decay: c_new = c_old - (c_old >> leak_shift)
    ├──> If STDP event: c_new += INCREMENT
    └──> Write back to Region 4

Summary of New Components#

Memory Regions#

  • HBM Region 4: Eligibility trace storage (one 36-bit value per synapse)

FIFOs (introduced as needed in the flow)#

  • etFIFO: Queues eligibility trace addresses during pointer processing

  • et2FIFO: Queues eligibility traces that should trigger weight updates

Control Signals#

  • Reward Register: 1-bit dopamine signal (set by new opcode CMD_SET_REWARD)

  • exec_reward: Broadcast signal from command_interpreter to internal_events_processor

Reused Logic#

  • Neuron update pipeline → Eligibility trace updates (disable threshold check)

  • HBM write infrastructure → Weight updates

  • Phase 2 synapse processing → Coincidence detection point


Key Design Decisions#

  1. Simple address mapping: Region4 = Region3 + constant offset

    • Makes address computation trivial (one addition)

    • Easily reversible for going from eligibility trace back to synapse

  2. Binary reward signal: On/off only (for now)

    • Simplifies initial implementation

    • Future: Could be multi-bit for graded reward

  3. Coincidence-based STDP: Post fires in same timestep as pre input

    • Simplified from asymmetric STDP window

    • Good enough for initial learning demonstrations

  4. Reuse existing hardware: Eligibility traces use neuron logic

    • Minimal new hardware required

    • Proven, tested infrastructure

  5. FIFO-based pipeline: Maintains synchronization

    • etFIFO keeps addresses aligned with synapse processing

    • et2FIFO queues weight updates for parallel processing


Testing the Implementation#

Simple Test Case#

Network:

  • Neuron A (pre-synaptic) → Neuron B (post-synaptic)

  • Single synapse with initial weight W = 500

  • Threshold = 2000

Experiment sequence:

  1. Timestep 0: Fire A repeatedly without B firing

    • No coincidence → eligibility trace stays at 0

  2. Timestep 5: Fire A enough to make B spike

    • B receives input, crosses threshold, spikes

    • Coincidence detected: pre (A) was active, post (B) fired

    • Eligibility trace increases (but reward is still OFF)

  3. Timesteps 6-10: No activity

    • Eligibility trace decays: c(t) × exp(-Δt/τ_c)

  4. Timestep 11: Turn on reward (set reward register = 1)

  5. Timestep 12: Fire A, B spikes again

    • Coincidence detected

    • Reward is ON

    • → Weight update triggered!

    • Read eligibility trace from Region 4: c(t) = 100 (example)

    • Compute: W_new = 500 + 100 = 600

    • Write updated weight to Region 3

Validation:

  • Read back synapse weight from HBM → should be 600

  • On next input from A, B should spike with less input (stronger synapse)


Potential Issues and Mitigations#

1. FIFO Overflow#

Issue: etFIFO or et2FIFO fills up if too many synapses processed

Mitigation:

  • Size FIFOs same as ptrFIFO (512 entries proven sufficient)

  • Add overflow detection flags

  • Backpressure: stall pointer processing if etFIFO full

2. HBM Bandwidth#

Issue: Additional reads/writes for Region 4 and weight updates

Impact analysis:

  • Existing: ~200-300 HBM transactions per timestep

  • New: +1 read per synapse (eligibility trace), +2 reads + 1 write per weight update

  • For network with 1000 active synapses, 10% coincidence: +100 transactions

Mitigation:

  • Weight updates run in parallel with ongoing Phase 2

  • Burst mode for sequential eligibility trace reads

  • Lazy decay (only update accessed traces)

3. Address Mapping Collision#

Issue: Region 4 might overlap with Region 3 if offset too small

Solution:

  • Choose REGION4_OFFSET > maximum Region 3 size

  • Example: If Region 3 uses rows 0x8000-0xFFFF (32K rows)

    • Set REGION4_OFFSET = 0x8000 (32768 in decimal)

    • Region 4 spans: 0x10000-0x17FFF

    • No overlap ✓

4. Coincidence Detection Granularity#

Issue: Current design: “post fired this timestep” = coincidence

Limitation: Doesn’t distinguish if post fired 1ms or 10ms after pre

Future enhancement:

  • Track spike times with sub-timestep resolution

  • Implement asymmetric STDP window (separate τ+ and τ-)

  • Different eligibility trace increments for potentiation vs depression

5. Eligibility Trace Decay Timing#

Issue: When to apply decay? Every timestep for all traces is expensive

Recommended approach: Lazy decay

  • Store last_update_timestamp with each trace

  • When reading for weight update:

    • Calculate time elapsed: Δt = current_time - last_update_time

    • Apply accumulated decay: c = c_stored × exp(-Δt/τ_c)

  • Only write back when updating (not every read)


Python API Usage#