In a post from earlier this year, How to run LLMs on IBM i, I showed how to run Large Language Models (LLMs) directly on IBM i using the open source framework Llama.cpp. In this Part 2, I’ll share a big update: there’s now an official RPM for Llama.cpp on IBM i 🎉.
Below, you’ll install the RPM, download a compatible model, and run an LLM from a PASE terminal.
<aside> ⭐
The llama.cpp package is available here: https://public.dhe.ibm.com/software/ibmi/products/pase/rpms/repo-base-7.3
Run yum upgrade to get the latest repo metadata. If you don’t see updates, try yum clean metadata.
</aside>
From a PASE terminal, install the package:
yum install llama-cpp
After installation, these commands will be available:
llama-cli
llama-quantize
llama-run
llama-server
llama-simple
llama-simple-chat
You’ll need a model in GGUF format that works with Llama.cpp on IBM i. Use the public Hugging Face repo: models-for-i (Models for IBM i)
| Model | Provider | Parameters |
|---|---|---|
| Llama-3.2 | Meta | 1 Billion |
<aside> 🗒️
Note: We only have one publicly compatible model: Llama-3.2-1B
You can manually prepare models for IBM i using this repo: https://github.com/ajshedivy/models-for-i
</aside>
Before downloading, create a models directory on IBM i: