Forlinx FAI-ARA240-M Edge AI Accelerator with 40 eTOPS and NXP’s i.MX95 SoM Support
Compact M.2 DNPU accelerator delivers high-efficiency AI with up to 40 eTOPS and seamless integration with NXP's i.MX95 SoM for edge applications.
Forlinx Ara240 DNPU-based Edge AI accelerator for high-performance real-time inference
Forlinx has introduced the FAI-ARA240-M, a compact Edge AI accelerator designed to push generative AI and real-time inference directly to embedded systems. Built around a dedicated DNPU (Deep Neural Processing Unit), the module delivers up to 40 eTOPS while maintaining low power consumption. It targets applications that demand high-performance AI at the edge, including industrial automation, smart cities, and vision-based systems.
At the same time, Forlinx continues to expand its ecosystem around NXP’s i.MX95 SoM, combining heterogeneous computing with dedicated AI acceleration. The company showcase this integration at Embedded World 2026 in Nuremberg, where they presented real-world demos focused on industrial security and multimodal AI workloads. By pairing the i.MX95 platform with the Ara240 accelerator, Forlinx aims to deliver a scalable and efficient edge AI pipeline.
Key Features:
- DNPU-based architecture delivering up to 40 eTOPS AI performance
- Compact M.2 2280 (M-Key) form factor for easy integration
- PCIe Gen4 x4 and USB 3.2 Gen1 host interfaces
- Up to 16GB LPDDR4 memory for high-throughput workloads
- Supports Transformer models, CNNs, and multimodal AI
- Concurrent multi-model execution without context switching
- Optimized for low latency and high efficiency edge inference
- Secure Boot and hardware root-of-trust for data protection
- Designed for industrial automation, smart retail, and vision systems
The FAI-ARA240-M comes in a standard M.2 (M-Key) 2280 form factor, making integration simple for industrial developers. It supports PCIe Gen4 x4 and USB 3.2 Gen1 interfaces, ensuring high-speed data transfer. The module includes up to 16GB LPDDR4 memory to handle data-intensive AI workloads.
Ara240 AI Acceleration Card
This accelerator focuses heavily on performance-per-watt. It can run multiple AI models simultaneously without switching overhead, which directly improves efficiency in real-time systems. Performance benchmarks show the module can process Llama2-7B at around 14 tokens per second and ResNet34 at 660 inferences per second.
The platform supports a flexible software stack with an extensible compiler compatible with TensorFlow, PyTorch, and ONNX, allowing developers to optimize models using quantization and efficient dataflow techniques to reduce latency and improve throughput. It integrates closely with NXP’s i.MX95 SoM, enabling hybrid NPU acceleration where workloads distribute dynamically between the i.MX95 NPU and the Ara240 DNPU. The FAI-ARA240-M does not have a fixed retail price and is available through direct inquiry and authorized distributors like Alibaba, with pricing based on order volume and project requirements, making it a scalable AI extension for demanding edge applications.
Images used courtesy of Forlinx

