T
ToolboxKit

Local AI Chat

Chat with an AI model running entirely in your browser using WebGPU. No server, no API keys, completely private and free.

Ad
Ad

About Local AI Chat

Run an AI chatbot directly in your browser with zero cloud costs. This tool uses WebLLM and WebGPU to load a small language model onto your GPU, so every message is processed locally on your device. Nothing is sent to a server.

Choose the right model

Three models are available. SmolLM2 360M is the smallest at roughly 200MB and loads quickly on most hardware. SmolLM2 1.7B offers better response quality for about 1GB of download. Phi-3.5 Mini is the most capable option at around 2GB but requires a stronger GPU. Use the AI model size calculator to estimate whether a given model fits your setup.

Completely private conversations

Because the model runs on your own GPU, your conversations stay on your machine. There are no API keys, no accounts, and no usage limits. The only network request is the initial model download, which your browser caches for future sessions.

Streaming responses and stats

Responses stream in token by token, similar to cloud-based chat interfaces. The tool displays tokens per second and total tokens generated so you can gauge performance. If you are building prompts for other models, try drafting them here first and then checking token usage with the AI token counter.

All processing happens client-side, so your conversations are never stored or transmitted anywhere.

Frequently Asked Questions

Which browsers support this tool?

You need a browser with WebGPU support. Chrome 113 and later, Edge 113 and later, and Safari 18 and later all have WebGPU enabled. Firefox does not support WebGPU yet. Desktop browsers tend to have the best support and performance.

Does this tool send my data to any server?

No. The AI model runs entirely on your device using your GPU via WebGPU. Your messages never leave your browser. The only network request is the initial model download, which is cached for future use.

Why does the first use take so long?

The model weights need to be downloaded the first time you use a particular model. The smallest model (SmolLM2 360M) is about 200MB, while the largest (Phi-3.5 Mini) is about 2GB. After the first download, your browser caches the files so subsequent loads are much faster.

How good are these models compared to ChatGPT or Claude?

These are much smaller models designed to run locally on consumer hardware. They are useful for simple tasks, quick questions, and experimentation, but they are significantly less capable than cloud-based models like GPT-4 or Claude. Think of them as a lightweight, private alternative for basic use.

What hardware do I need?

You need a device with a GPU that supports WebGPU. Most modern laptops and desktops with dedicated or integrated GPUs from the last few years will work. The smaller models run well on modest hardware, while the larger models benefit from more GPU memory.