<aside> ⚠️

Note: This is experimental work.

This guide is a work in process and is intended for experimentation and prototyping.

Expect rough edges, manual steps, and a few patches along the way.

Happy building! 👷🛠️✨

</aside>

In this post, I’ll walk you through how I successfully built llama.cpp on IBM i, step-by-step—from configuring the environment and compiling the source to running a local LLM right on the system. You’ll also learn how to prep a compatible model and test it using the built-in CLI, HTTP server, and web UI.

🤖 What is Llama.cpp?

llama.cpp is a lightweight, blazing-fast C++ implementation for running Large Language Models (LLMs) locally on a wide range of hardware—no internet, no cloud, no external dependencies. It’s the engine powering many popular tools like Ollama, LM Studio, and other open-source AI apps. And now—for the first time—you can run it directly on IBM i 🧵🎉

🧪 Why run LLMs locally on IBM i?

If you’ve been curious about integrating LLMs into your applications, this is the perfect way to get started. Running LLMs directly on IBM i enables quick, cost-free prototyping of AI-powered features, all within your existing environment:

✏️ Build a chat-style interface for interacting with system data
⚙️ Experiment with text generation, summarization, or code suggestions
📦 Package and test ideas locally—no need to send data off the box

🛠️ Build Llama.cpp on IBM i

First make sure you have the open source environment installed on your IBM i system. you can follow directions here: https://ibmi-oss-docs.readthedocs.io/en/latest/yum/README.html#installation

to configure your environment.

1. Install Open source packages

Install the required open source packages:

gcc12