Skip to content

Blueprint Section - Cllama (llama.cpp)

blueprint

Offline model

Cllama is implemented based on llama.cpp and supports offline AI inference model usage.

Since it is offline, we need to first prepare the model files, such as downloading the offline model from the HuggingFace website: Qwen1.5-1.8B-Chat-Q8_0.gguf

Place the model in a specific folder, for example, in the game project directory Content/LLAMA.

E:/UE/projects/FP_Test1/Content/LLAMA > ls
qwen1.5-1_8b-chat-q8_0.gguf*

Once we have the offline model file, we can then use Cllama for AI chatting.

Text chat

Use Cllama for text chatting.

In the blueprint, right-click to create a node Send Cllama Chat Request.

guide bludprint

Create an 'Options' node and set Stream=true, ModelPath="E:\UE\projects\FP_Test1\Content\LLAMA\qwen1.5-1_8b-chat-q8_0.gguf".

guide bludprint

guide bludprint

Create Messages, add a System Message and a User Message respectively.

guide bludprint

Create a Delegate to receive the model's output information and print it on the screen.

guide bludprint

guide bludprint

The complete blueprint looks like this, running the blueprint will display the message returned on the game screen when printing the large model.

guide bludprint

guide bludprint

生成文字图片 llava

Cllama also experimentally supports the llava library, providing the capability of Vision.

First, prepare the Multimodal offline model file, such as Moondream (moondream2-text-model-f16.gguf, moondream2-mmproj-f16.gguf)或者 Qwen2-VL(Qwen2-VL-7B-Instruct-Q8_0.gguf, mmproj-Qwen2-VL-7B-Instruct-f16.ggufOr other multimodal models supported by llama.cpp.

Create the Options node and set the parameters "Model Path" and "MMProject Model Path" to the corresponding Multimodal model files.

guide bludprint

Create a node to read the image file flower.png and configure the messages.

guide bludprint

guide bludprint

Finally, create a node to receive the returned information and print it on the screen. The complete blueprint looks like this:

guide bludprint

Running the blueprint will display the returned text.

guide bludprint

llama.cpp uses GPU

"Add parameter 'Num Gpu Layer' to 'Cllama Chat Request Options', which can set the gpu payload of llama.cpp and control the number of layers to be computed on the GPU. See the image below."

guide bludprint

Handle model files in the .Pak after packaging.

After enabling "Pak" packaging, all project resources will be placed in a .Pak file, which naturally includes offline model gguf files.

Due to the incapacity of llama.cpp to directly read .Pak files, it is necessary to extract the offline model files from the .Pak file and copy them into the file system.

AIChatPlus provides a feature that can automatically copy and process model files from .Pak and place them in the Saved folder.

guide bludprint

Alternatively, you can handle the model files in the .Pak yourself. The key is to make sure to copy the files out because llama.cpp is unable to read .Pak files correctly.

Function node

Cllama provides some functional nodes to easily access the status under the current environment.

Check if "Cllama Is Valid":judges if Cllama llama.cpp is initialized correctly.

guide bludprint

Check if llama.cpp supports GPU backend in the current environment.

guide bludprint

Retrieve all the backends supported by the current llama.cpp.

guide bludprint

Automatically copies the model files from Pak to the file system.

guide bludprint

Original: https://wiki.disenone.site/en

This post is protected by CC BY-NC-SA 4.0 agreement, should be reproduced with attribution.

This post was translated using ChatGPT, please provide feedback in FeedbackPoint out any omissions.