
DeepSeek-R1-Distill-Qwen-1.5B AI Model Deployed on Rockchip RK3588 SoC Using RKLLM Toolkit
DeepSeek R1, the latest AI model from China, is gaining attention for its advanced reasoning capabilities. Built on the DeepSeek-V3 base model and trained using large-scale reinforcement learning (RL), DeepSeek R1 excels in solving complex problems in math, coding, and logic, positioning itself as a competitor to AI giants like OpenAI.
To enable hardware-accelerated inference, Radxa has released instructions for running DeepSeek R1 (Qwen2 1.5B) on an NPU. Specifically, the model can leverage the 6 TOPS NPU accelerator of the Rockchip RK3588 SoC using the RKLLM toolkit.
Radxa tested various models and reported the following performance on Rockchip RK3588(S) hardware:
- TinyLlama 1.1B: 15.03 tokens/s
- Qwen 1.8B: 14.18 tokens/s
- Phi3 3.8B: 6.46 tokens/s
- ChatGLM3: 3.67 tokens/s
Radxa has provided pre-compiled RKLLM models and executable files from ModelScope. Users can directly download and use them.
git clone https://www.modelscope.cn/radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM.git
RKLLM simplifies the deployment of LLM models on Rockchip chips, currently supporting the RK3588 and RK3576. To utilize the RKNPU, users must first run the RKLLM-Toolkit on an x86 workstation to convert the trained model into the RKLLM format. Once converted, the RKLLM C API can be used on the development board for inference.
It has four files:
- configuration.json: Configuration file
- librkllmrt.so: RKLLM library
- llm_demo: Demo program
- DeepSeek-R1-Distill-Qwen-1.5B.rkllm (1.9GB): DeepSeek R1 Qwen 1.5B compiled with RKLLM
- README.md
Run the model conversion script:
cd rknn-llm/rkllm-toolkit/examples/ python3 test.py
After successful conversion, the DeepSeek-R1-Distill-Qwen-1.5B.rkllm model will be generated.
When it was tested with DeepSeek-R1 (Qwen 14B) on the GPU of the Rockchip RK3588 SoC, we achieved 1.4 token/s and later had to install an AMD W7700 graphics card on it for better performance. When it was tested with DeepSeek R1 (Qwen2 1.5B) on an NPU, we achieved almost 15 tokens/s.
DeepSeek is perfectly adapted and operates efficiently on Arm based SoM like Rockchip RK3588, RK3588S, or RK3576 development boards. The above demo was tested on Radxa ROCK5 Model B RK3588 SoC and DeepSeek R1 (Qwen 1.5B) running the Banana Pi BPI-M7 board (RK3588).