This guide explores how to set up a powerful local code assistance solution by combining Modular's MAX Serve with the Continue VS Code extension.
We'll walk through the process of deploying a local language model using MAX Serve and integrating it with Continue, creating a seamless development experience that operates entirely on your local machine. This setup provides the benefits of AI-powered code assistance while maintaining full control over your development environment.
MAX Serve is a Python-based inference server that provides an OpenAI-compatible REST endpoint for both local and cloud environments. At its core, MAX Serve uses MAX Engine, which features a high-performance graph compiler and runtime for model acceleration.
The open-source code assistant Continue can connect directly to MAX Serve's OpenAI-compatible REST endpoint.
To get started, we'll launch the MAX Serve endpoint and then configure it in Continue.
The following is from the MAX documentation: https://docs.modular.com/max/tutorials/deploy-pytorch-llm#deploy-to-a-local-endpoint
Install the Hugging Face CLI: https://huggingface.co/docs/huggingface_hub/en/guides/cli
Authenticate using huggingface-cli
huggingface-cli login