VS Code's New Local AI Mode Changes How Developers Can Work Offline
Visual Studio Code version 1.122 introduced a feature called "Bring Your Own Key" that allows developers to use locally hosted AI models for chat, tools, and coding assistance without needing GitHub sign-in or internet connectivity. This opens the door for developers who want to keep their work private, operate in restricted environments, or avoid relying on cloud-based AI services.
What Can You Actually Do With Local AI in VS Code?
The new functionality lets you integrate locally hosted large language models (LLMs), which are AI systems trained on vast amounts of text data, directly into your development workflow. You can use these models for chat conversations, utility tasks, and Model Context Protocol servers, which are standardized ways for AI tools to interact with each other. However, there's one significant limitation: you still cannot use local models for inline code suggestions or next-edit predictions, the kind of real-time autocomplete that GitHub Copilot provides.
The practical appeal is clear for certain use cases. Developers working in air-gapped environments, where systems are intentionally disconnected from the internet for security reasons, can now leverage AI assistance without workarounds. Similarly, organizations with strict data policies can keep all AI processing on their own servers rather than sending code snippets to external APIs.
How to Set Up a Local LLM in VS Code
- Choose a Model Host: You need dedicated software to run the model on your hardware. LM Studio is one popular option that provides a graphical interface for managing and serving LLMs locally without requiring command-line expertise.
- Select an Appropriate Model: Not all AI models work well on consumer hardware. Choose a model that fits within your available graphics card memory (VRAM) along with space for processing context, the amount of text the model can consider at once. Models suited for coding work and fitting comfortably into 8GB of VRAM are available and practical for most developers.
- Configure the Endpoint in VS Code: Open the language model management panel by pressing Ctrl-Shift-P and typing "Manage Language Models." Add a custom endpoint by specifying the model's name, the server URL (typically something like http://127.0.0.1:1234/v1), and any API key if you've configured one.
- Test the Integration: Once configured, launch VS Code's chat window and verify that your local model responds correctly. The model should appear in your language model list and be selectable for chat and utility tasks.
Why Local AI Matters for Design and Development Work
Beyond coding assistance, developers are discovering that local models can handle specialized tasks. A recent test of local LLMs for UI design work found that some models can generate functional design layouts when paired with the right tools. One developer tested three different local models for creating a café website design and found that Google's Gemma model, despite being slower on consumer hardware with 8GB of VRAM, produced the most coherent design output with properly structured event cards, a two-column menu layout with aligned pricing, and appropriate typography choices.
This suggests that the quality of local model outputs depends heavily on the specific model chosen and the task at hand. The same developer noted that stronger coding models tend to produce better design results overall, since layout sense, hierarchy, and typography choices seem to follow naturally when the underlying code generation is solid.
The Current Limitations and Workarounds
While VS Code's local AI integration is a significant step forward, it's not a complete replacement for cloud-based AI services yet. The inability to use local models for inline code completions remains a notable gap. Developers who want that functionality can use third-party tools like Continue, which bridges this gap by enabling local models to provide real-time suggestions alongside VS Code's native features.
Additionally, compatibility between local models and certain AI-powered design tools can be inconsistent. Some models may struggle with specific prompt formats or context lengths, requiring manual adjustments or workarounds like rendering HTML output through alternative tools rather than relying on integrated preview panes.
The hardware requirements also matter significantly. Running larger models on systems with limited VRAM can be slow or unstable. One developer working with a 12-billion-parameter model on 8GB of VRAM experienced timeouts and generation failures, though smaller models in the 7-billion-parameter range performed more reliably.
What This Means for the Future of Developer Tools
Microsoft's decision to enable local model support in VS Code reflects a broader shift in how developers want to interact with AI. The company has been heavily invested in GitHub Copilot as its primary AI offering, but the new "Bring Your Own Key" feature acknowledges that not all developers want or can use cloud-based solutions. Whether Microsoft will eventually extend local model support to inline code completions remains unclear, as GitHub Copilot's deep integration is a significant part of how the service reaches its audience.
For now, developers have a practical path forward: they can use local models for a substantial portion of their AI-assisted development work through VS Code's chat and utility features, and they can fill the gaps with third-party tools if needed. As local models continue to improve and hardware becomes more capable, the appeal of keeping AI processing entirely on-device will likely grow stronger.