This guide explores how to set up a powerful local code assistance solution by combining Modular's MAX Serve with the Continue VS Code extension.

We'll walk through the process of deploying a local language model using MAX Serve and integrating it with Continue, creating a seamless development experience that operates entirely on your local machine. This setup provides the benefits of AI-powered code assistance while maintaining full control over your development environment.

MAX Serve

MAX Serve is a Python-based inference server that provides an OpenAI-compatible REST endpoint for both local and cloud environments. At its core, MAX Serve uses MAX Engine, which features a high-performance graph compiler and runtime for model acceleration.

source: https://docs.modular.com/max/serve

Continue

The open-source code assistant Continue can connect directly to MAX Serve's OpenAI-compatible REST endpoint.

Continue

To get started, we'll launch the MAX Serve endpoint and then configure it in Continue.

Deploy local Endpoint

The following is from the MAX documentation: https://docs.modular.com/max/tutorials/deploy-pytorch-llm#deploy-to-a-local-endpoint

Prerequisites

Install the Hugging Face CLI: https://huggingface.co/docs/huggingface_hub/en/guides/cli

Authenticate using huggingface-cli

huggingface-cli login