gemma4 supports 256k, but 16GB VRAM is the limit. Flash attention plus q8 KV cache halve KV memory so 65536 fits mostly on GPU (58% layers). 128k/256k still load but offload KV into system RAM and slow down. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .env.example | ||
| .gitignore | ||
| compose.yaml | ||
| README.md | ||
ollama-gemma4
Dockerized Ollama server with GPU passthrough, auto-pulling
gemma4:26b-a4b-it-q4_K_M on first start. Built for opencode (or any
OpenAI-compatible client) on an RTX 5060 Ti (16 GB).
Requirements
- Docker + Compose plugin
- NVIDIA driver +
nvidia-container-toolkit(nvidia runtime registered with Docker)
Usage
docker compose up -d # starts server, then pulls the model (~18 GB) once
docker compose logs -f model-pull # watch the first-time download
The model-pull service exits 0 when the pull finishes — that's expected, not a
crash. The ollama server keeps running.
docker compose ps # ollama should stay "running"/healthy
docker compose down # stop (model stays cached in the named volume)
The model cache lives in the ollama named volume, so it survives down/up.
Verify it works
curl http://localhost:11434/api/tags # model listed?
curl http://localhost:11434/api/generate -d '{
"model": "gemma4:26b-a4b-it-q4_K_M",
"prompt": "Say hi in one word.",
"stream": false
}'
Confirm it landed on the GPU (PROCESSOR column should read 100% GPU, or close):
docker compose exec ollama ollama ps
opencode
Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1.
Point opencode at it via opencode.json (project or ~/.config/opencode/):
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"gemma4:26b-a4b-it-q4_K_M": { "name": "Gemma4 26B-A4B (local)" }
}
}
}
}
Then opencode and pick ollama/gemma4:26b-a4b-it-q4_K_M.
Config
Override defaults by copying .env.example to .env (e.g. swap OLLAMA_MODEL
to try a different tag).