A neuromorphic architecture for reinforcement learning from real-valued observations

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

Abstract

Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel neuromorphic architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.
Original languageEnglish
Title of host publicationProceedings of the 2024 International Joint Conference on Neural Networks (IJCNN 2024), june 30th - july 5th, 2024, Yokohama, Japan
Place of PublicationU.S.
PublisherIEEE
Number of pages10
ISBN (Electronic)9798350359312
DOIs
Publication statusPublished - 2024
Event2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Duration: 30 Jun 20245 Jul 2024

Conference

Conference2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/TerritoryJapan
CityYokohama
Period30/06/245/07/24

Keywords

  • FEAST
  • reinforcement learning
  • spiking neural networks
  • STDP

Fingerprint

Dive into the research topics of 'A neuromorphic architecture for reinforcement learning from real-valued observations'. Together they form a unique fingerprint.

Cite this