How to deploy and test the DeepSeek on the RK3588 development board

Joined
Jan 3, 2025
Messages
3
Recently,DeepSeek (Chinese name:深度求索),a new star in the AI world,has risen rapidly and has become popular on the Internet with its low-cost,high-performance AI model. Its core is a powerful language model that can understand natural language and generate high-quality text. In addition,DeepSeek is open to developers around the world for free,accelerating the popularization of AI technology.

RK3588performance advantages

As a high-performance AI chip,RK3588 adopts 8nm LP process,equipped with eight-core processor,quad-core GPU and 6TOPS computing power NPU. Relying on powerful performance and low power consumption,it is very suitable for edge computing scenarios.  [New product release] Xunwei RK3588 artificial intelligence AI motherboard

Can Deepseek be deployed on RK3588?

There are two ways to deploy Deepseek on RK3588,namely using Ollama tool deployment and using Rockchip's official RKLLM quantitative deployment. The following introduces these two deployment methods respectively.

987c519811e787a7414b4c2851c564ac67aaafa37624a.png


01-Deploy using Ollama tool

Ollama is an open source large model service tool that can support the latest deepseek model, as well as Llama 3, Phi 3, Mistral, Gemma and other models. After installing the Ollama tool, use the following command to deploy the deepseek-r1 model with 1.5 billion parameters in one click.

Here we are only running a model with 1.5 billion parameters, so the response may not be very accurate. If you want a higher accuracy, you can switch to a model with larger parameters, but the response speed will also be slower after the parameters become larger, and the inference model deployed using the Ollama tool calls the CPU for calculations, as shown in the following figure:

You can see that the CPU load reached 100% during the response process, and the NPU was not called for acceleration.

So how to call the powerful NPU of RK3588?

This depends on the second method of using Rockchip's official RKLLM for quantization deployment.

02-Use RKLLM for quantization deployment

RKLLM-Toolkit is a development kit that provides users with quantization and conversion of large language models on computers. The following functions can be easily completed through the Python interface provided by the tool:

1. Model conversion:Supports conversion of large language models in some formats to RKLLM. The converted RKLLM model can be loaded and used on the Rockchip NPU platform

2. Quantization function:Supports quantization of floating-point models to fixed-point models

RKLLM model converted by DeepSeek

Then transfer it to the development board and run it with the corresponding executable file. After running

Next,ask questions to the model

Check the CPU and NPU utilization during the reply process. You can see that the CPU occupancy rate has dropped,and the three cores of the NPU are called for accelerated reasoning:

So far,the deployment reasoning test of DeepSeek on RK3588 has been completed.

 
Top