Low-latency scalable hierarchical routing and partitioning of reconfigurable neuromorphic systems

  • Samalika Perera

Western Sydney University thesis: Doctoral thesis

Abstract

This thesis serves as the foundation for building an FPGA-based large-scale neural simulator design that can be easily scaled up to brain-scale simulations by addressing the communication challenges of multi-FPGA routing architectures. Integrating multiple cores into one system is essential for large-scale neural network simulations. As the number of cores increases, the performance of the neural network simulation decreases significantly. The increase in the number of cores that can be integrated into a single chip has forced designers to use computer network concepts in System-on-chip (SoC) designs. This idea led to the development of network-on-chip (NoC) solutions to manage increasing cores on a single chip. This work primarily aims to implement a novel multi-FPGA-based routing architecture based on NoC hierarchical routing, minimising communication bottlenecks in the multi-FPGA design. The second goal is to develop a real-time mapping compiler that efficiently maps the input neural network onto the proposed architecture. This aims to mitigate communication bottlenecks within the design. We've employed a hierarchical tree-based interconnection architecture in the proposed multi- FPGA architecture. This architecture is scalable as new branches can be added to the tree, maintaining a constant local bandwidth. This tree-based approach contrasts with linear NoC, where congestion can arise from numerous connections. The proposed routing architecture introduces the arbiter mechanism by employing stochastic arbitration considering the data level queues of FIFOs (First In, First Out). This mechanism effectively reduces the bottleneck caused by FIFO congestion, resulting in improved overall latency. Therefore, this thesis introduces a novel concept that utilises stochastic arbiters in a hierarchical tree-based architecture. The thesis also introduces a hierarchical partitioning method to map the neural network, aiming for better latency performance. In the context of distributed computing with multiple cores, communication load balancing is a strategy to enhance latency performance. Most partitioning algorithms need help to address scalability in network workloads and efficiently find a globally optimal partition for hardware mapping. As communication is considered the most energy and time-consuming aspect of distributed processing, the partitioning framework is optimised for compute-balanced, memory-efficient parallel processing, targeting low-latency execution with minimal routing across various compute cores. There's always a trade-off between the best optimal balanced partitioning configuration and the time complexity of partitioning algorithms for large- scale neural networks, which often needs to be significantly considered for large-scale communication. This thesis proposes a novel partitioning concept suitable for large-scale neural networks with better time complexity. We've introduced a process based on iterative real-time hardware simulations. This process maps a given neural network to the proposed multi-FPGA system, thereby improving latency performance. These proposed techniques can be employed in neuromorphic systems and are also deployable for general distributed multi-core processing techniques, aiming for low-latency computing and enhanced scalability.
Date of Award2023
Original languageEnglish
Awarding Institution
  • Western Sydney University
SupervisorMark Wang (Supervisor)

Keywords

  • Field programmable gate arrays--Design and construction
  • Neural networks (Computer science)

Cite this

'