Run Local AI on Linux: Complete Ollama + Open WebUI Setup Guide (2026)

Install Ollama on Debian/Ubuntu, configure GPU passthrough, add Open WebUI with Docker, and expose it via Tailscale. Tested on real homelab hardware.

Terminal showing Ollama pulling a Llama 3 model on Linux with GPU acceleration active and Open WebUI running

Why Run AI Locally?

You don’t need OpenAI’s API. You don’t need a $200/month cloud bill. A single machine in your home lab can run large language models — all without sending a byte of data to someone else’s server.

Prerequisites

  • Debian 12 or Ubuntu 22.04+
  • 16 GB RAM minimum (32 GB recommended for 7B+ models)
  • NVIDIA GPU with 6 GB+ VRAM (optional but strongly recommended)
  • Docker + Docker Compose installed

Step 1 — Install Ollama

# One-liner install (installs to /usr/local/bin/ollama)
curl -fsSL https://ollama.com/install.sh | sh

# Verify it's running
systemctl status ollama

Ollama runs as a systemd service and exposes an OpenAI-compatible API on port 11434.

Step 2 — Pull Your First Model

# Pull Llama 3.2 (3B — fast, runs on CPU)
ollama pull llama3.2

# Pull Mistral 7B (better quality, needs GPU)
ollama pull mistral

# Test it
ollama run llama3.2 "Explain Linux runlevels in 3 sentences"

Step 3 — Add Open WebUI with Docker

Create a docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui:
docker compose up -d
# Access at http://localhost:3000

Step 4 — GPU Passthrough (NVIDIA)

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

# Add repo and install
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU is accessible inside containers:

docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Step 5 — Expose via Tailscale (Remote Access)

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# Serve Open WebUI on your Tailnet with HTTPS
tailscale serve --bg https / http://localhost:3000

Now you can access https://your-hostname.tailnet-name.ts.net from any device on your Tailscale network — phone, laptop, anywhere.

Benchmarks

ModelHardwareTokens/sec
llama3.2:3bCPU only (Ryzen 7)18 t/s
mistral:7bRTX 3060 12GB62 t/s
llama3.1:8bRTX 3060 12GB48 t/s

Troubleshooting

Ollama not starting? Check journalctl -u ollama -f

GPU not detected? Run nvidia-smi — if that works but Ollama doesn’t use GPU, restart the ollama service after installing the NVIDIA toolkit.

Open WebUI can’t reach Ollama? The host.docker.internal extra_hosts entry is the fix on Linux — Docker Desktop handles this automatically on Mac/Windows.


Once Ollama is running, a few natural next steps from this site:

Frequently Asked Questions

Can Ollama run without a GPU?
Yes. Ollama falls back to CPU inference automatically if no GPU is detected. Smaller models like llama3.2:3b run at 12-18 tokens/sec on a modern CPU, which is usable for most tasks. Larger 7B+ models are slow on CPU — plan for 2-4 tokens/sec without GPU acceleration.
What is Open WebUI?
Open WebUI is a self-hosted web interface for Ollama that works like ChatGPT — conversation history, model switching, system prompts, and file uploads. It runs as a Docker container and connects to your local Ollama instance over its API.
Which Ollama models work best without a GPU?
llama3.2:3b is the best starting point for CPU-only setups — fast enough for interactive use and capable enough for most tasks. phi3:mini is another good CPU option. Avoid 7B+ models on CPU unless you're willing to wait 10-30 seconds per response.
Can multiple users access my Ollama instance?
Yes, if you expose it over the network. By default Ollama listens on localhost only. Set OLLAMA_HOST=0.0.0.0 in the systemd service to expose it on your LAN, then use Tailscale or WireGuard to share secure access with other users without opening firewall ports.

Get notified when new articles and designs land:

No spam. Unsubscribe any time.

Sergej Voronko
Sergej Voronko
SAP Basis · Senior Operations Manager · Linux infrastructure engineer
About the author →

[discussion]

Comments are powered by Giscus — backed by GitHub Discussions. Sign in with GitHub to join the conversation.