How to Run LLMs on IBM i Pt. 2

In a post from earlier this year, How to run LLMs on IBM i, I showed how to run Large Language Models (LLMs) directly on IBM i using the open source framework Llama.cpp. In this Part 2, I’ll share a big update: there’s now an official RPM for Llama.cpp on IBM i 🎉.

Below, you’ll install the RPM, download a compatible model, and run an LLM from a PASE terminal.

1) Install Llama.cpp

<aside> ⭐

The llama.cpp package is available here: https://public.dhe.ibm.com/software/ibmi/products/pase/rpms/repo-base-7.3

Run yum upgrade to get the latest repo metadata. If you don’t see updates, try yum clean metadata.

</aside>

From a PASE terminal, install the package:

yum install llama-cpp

After installation, these commands will be available:

llama-cli
llama-quantize
llama-run
llama-server
llama-simple
llama-simple-chat

2) Download a compatible model

You’ll need a model in GGUF format that works with Llama.cpp on IBM i. Use the public Hugging Face repo: models-for-i (Models for IBM i)

Available models:

Model	Provider	Parameters
Llama-3.2	Meta	1 Billion

<aside> 🗒️

Note: We only have one publicly compatible model: Llama-3.2-1B

You can manually prepare models for IBM i using this repo: https://github.com/ajshedivy/models-for-i

</aside>

Prerequisites

Before downloading, create a models directory on IBM i: