Result Details

TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks

KLHŮFEK, J.; MARCHISIO, A.; MRÁZEK, V.; SEKANINA, L.; SHAFIQUE, M. TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks. IEEE Access, 2025, vol. 13, no. October, p. 177215-177226.

Type

journal article

Language

English

Authors

Klhůfek Jan, Ing., DCSY (FIT)
Marchisio Alberto
Mrázek Vojtěch, Ing., Ph.D., DCSY (FIT)
Sekanina Lukáš, prof. Ing., Ph.D., DCSY (FIT)
Shafique Muhammad

Abstract

Transformers are neural network models that have gained popularity in various advanced AI
systems including embedded/Edge-AI. Due to their architecture, hardware accelerators can leverage massive parallelism, especially when processing attention head operations. While accelerators for Transformers are being discussed in the literature, efficient scheduling of cache operations and detailed modeling of inference dynamics has not yet been addressed comprehensively. In this paper, we introduce TransInferSim, a novel tool that combines cycle-accurate simulation for performance estimation (including latency, memory usage, memory access counts, and computation counts) with a discrete-event-based scheduler that determines the execution order of compute and memory operations. By combining this tool with the Accelergy tool, the simulator enables accurate estimation of energy consumption and on-chip area, leveraging pre-characterized hardware parameters. The proposed tool allows for the accurate determination of cache misses at different levels and with different victim selection configurations. It supports different memory hierarchies and offers several strategies for scheduling operations on compute units. In addition, TransInferSim can extract the full execution plan generated during simulation, enabling its further use for behavioral Register Transfer Level validation or for deployment in real hardware implementations. This makes the tool applicable not only for high-level design space exploration, but also as a software front-end for hardware execution mapping. Finally, we can optimize the architecture for a particular network, as demonstrated through multiobjective design space exploration to adjust the size of processing arrays. In our experiments, the introduction of an on-chip memory hierarchy improved the inference speed by ∼3.5× and reduced energy by ∼1.9× for the RoBERTaBase Transformer model, while design space exploration achieved up to 10× latency reduction and 6× area savings for the ViTTiny vision Transformer. The tool is available online at https://github.com/ehw-fit/TransInferSim.

Keywords

Transformers, hardware accelerators, modeling tools, memory subsystem, evaluation and optimizations

URL

https://ieeexplore.ieee.org/document/11202474

Published

2025

Pages

177215–177226

Journal

IEEE Access, vol. 13, no. October, ISSN

DOI

10.1109/ACCESS.2025.3621062

UT WoS

001596848900005

EID Scopus

2-s2.0-105019806172

BibTeX

@article{BUT193349,
  author="Jan {Klhůfek} and  {} and Vojtěch {Mrázek} and Lukáš {Sekanina} and  {} and  {} and  {}",
  title="TransInferSim: Toward Fast and Accurate Evaluation of Embedded Hardware Accelerators for Transformer Networks",
  journal="IEEE Access",
  year="2025",
  volume="13",
  number="October",
  pages="177215--177226",
  doi="10.1109/ACCESS.2025.3621062",
  issn="2169-3536",
  url="https://ieeexplore.ieee.org/document/11202474"
}

Files

pdf TransInferSim_Toward_Fast_and_Accurate_Evaluation_of_Embedded_Hardware_Accelerators_for_Transformer_Networks.pdf 1 MB

Projects

LEDNeCo: Low Energy Deep Neurocomputing, GACR, Standardní projekty, GA25-15490S, start: 2025-01-01, end: 2027-12-31, running

Research groups

EvoAI Hardware (RG EHW)

Departments

Fakulta informačních technologií (FIT)