---
title: "Ollama Environment Variables: The Complete Reference (2026)"
description: "Every Ollama environment variable in one reference: OLLAMA_HOST, OLLAMA_KEEP_ALIVE, OLLAMA_KV_CACHE_TYPE and more, with defaults and per-platform setup."
date: 2026-06-02
author: "Ben Racicot"
tags: ["Ollama", "Text Generation", "Developer", "Privacy", "macOS", "Performance"]
type: "article"
canonical: "https://modelpiper.com/blog/ollama-environment-variables"
---

# Ollama Environment Variables: The Complete Reference (2026)

> Every Ollama environment variable in one reference: OLLAMA_HOST, OLLAMA_KEEP_ALIVE, OLLAMA_KV_CACHE_TYPE and more, with defaults and per-platform setup.

## TL;DR

Ollama is configured through environment variables read by the server at startup. The most-used are OLLAMA_HOST (bind address), OLLAMA_MODELS (storage path), OLLAMA_KEEP_ALIVE (how long models stay loaded), OLLAMA_KV_CACHE_TYPE (cache quantization), and OLLAMA_FLASH_ATTENTION. Set them in your shell, with launchctl on the macOS app, or systemd on Linux, then restart Ollama.

Ollama's configuration lives in environment variables, and finding the full list is harder than it should be. The official FAQ documents a handful. The rest are scattered across GitHub issues, Reddit threads, and the source. Two of the top Google results for "ollama environment variables" are literally GitHub issues asking the maintainers to write this page down.

So here it is: every Ollama environment variable that matters, what it controls, its default, and where to set it on macOS, Linux, Docker, and Windows. If you want Ollama to use less memory, stay loaded between requests, or accept connections from another machine, the variable you need is in the table below. This reference assumes Ollama is already running. If it is not, our [install guide for Mac](/blog/install-ollama-mac) gets you there first.

## What are Ollama environment variables?

Ollama environment variables are settings the Ollama server reads at startup to control networking, model storage, memory behavior, and inference performance. The server reads them once when it launches, so changing one only takes effect after you restart Ollama.

The distinction that trips people up: these configure the background server (`ollama serve`), not the `ollama run` command. Setting one in the same shell where you type `ollama run` does nothing if the server is already running as a separate process or a macOS app. The server has its own environment, and that is the one that counts.

## Where do you set Ollama environment variables?

Set Ollama environment variables in your shell profile for terminal use, with `launchctl setenv` for the macOS app, in the systemd unit on Linux, or with `-e` flags for Docker. Ollama reads them at server startup, so restart Ollama after any change.

The right method depends entirely on how Ollama is running, and getting this wrong is the most common reason a variable seems to be ignored.

**Terminal (you run `ollama serve` yourself):** add the export to your shell profile.

```
export OLLAMA_KV_CACHE_TYPE=q8_0
```

Put that in `~/.zshrc` (macOS default) or `~/.bashrc`, then open a new terminal and start the server.

**macOS app:** the menu-bar app does not read your shell profile. Use `launchctl`, then quit and reopen Ollama.

```
launchctl setenv OLLAMA_KV_CACHE_TYPE q8_0
```

**Linux (systemd service):** edit the unit with `sudo systemctl edit ollama.service` and add the variable under `[Service]`.

```
[Service]
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
```

Then `sudo systemctl daemon-reload` and `sudo systemctl restart ollama`.

**Docker:** pass it with `-e` at run time.

```
docker run -e OLLAMA_KV_CACHE_TYPE=q8_0 -p 11434:11434 ollama/ollama
```

**Windows:** set it in the system environment variables panel (search "edit environment variables"), then quit Ollama from the system tray and relaunch.

## The complete Ollama environment variable reference

Every variable Ollama reads, grouped by what it controls, with the default and a common override. Defaults shift between releases, so treat the [official Ollama FAQ](https://docs.ollama.com/faq) and the [source config](https://github.com/ollama/ollama/blob/main/envconfig/config.go) as the version-specific ground truth. One default that moves: `OLLAMA_CONTEXT_LENGTH` is 4096 on most machines, but Ollama 0.15.5 made it VRAM-aware, so a Mac with 24GB or more of unified memory can default to 32K tokens or higher.

## How do I keep an Ollama model loaded in memory?

Set `OLLAMA_KEEP_ALIVE` to control how long a model stays in memory after its last request. The default is 5 minutes. Use `-1` to keep models loaded indefinitely, or `0` to unload immediately after each request.

This is the variable people reach for after the first time a model unloads mid-session and the next prompt takes ten seconds to reload from disk. `OLLAMA_KEEP_ALIVE=-1` trades RAM for latency: the model sits in memory until you stop the server. On a machine you use for AI all day, that trade is usually worth it. On a 16GB Mac where you also need memory for everything else, the 5-minute default exists for a reason.

## How do I reduce Ollama's memory usage?

Set `OLLAMA_KV_CACHE_TYPE=q8_0` to roughly halve the memory the context cache uses, and enable `OLLAMA_FLASH_ATTENTION=1`. Together they let the same model run longer contexts in less RAM with no visible quality loss.

The context (KV) cache grows with your context length and, at 32K or more tokens, can use more memory than the model weights themselves. Quantizing it from the default `f16` down to `q8_0` cuts that roughly in half for a perplexity increase that benchmarks put at 0.002 to 0.05, which nobody notices in practice. We cover the full quality and memory trade-off, including the more aggressive `q4_0`, in [Ollama KV cache quantization](/blog/ollama-kv-cache-quantization).

## How do I let other devices connect to Ollama?

Set `OLLAMA_HOST=0.0.0.0:11434` to bind Ollama to all network interfaces, and `OLLAMA_ORIGINS` to allow cross-origin browser requests. By default Ollama binds only to `127.0.0.1`, so other machines cannot reach it, and it allows browser requests only from `localhost` origins.

The two are separate problems. `OLLAMA_HOST` controls which network interface the server binds to. `OLLAMA_ORIGINS` controls which web origins the browser is allowed to call. If you are wiring a web app to Ollama and getting a silent CORS failure, the host is fine and the origins are the issue. We walk through that exact fix in [the Ollama CORS fix on Mac](/blog/ollama-cors-fix-mac).

## Can Ollama offload the KV cache to system RAM?

Ollama has no dedicated environment variable for KV cache offload. It offloads model layers and the KV cache to the GPU together, and the runner flag `--no-kv-offload` keeps the cache in system RAM while the layers stay on the GPU.

This comes up when a model almost fits in VRAM and the cache is what pushes it over. There is an open discussion ([ollama/ollama#9750](https://github.com/ollama/ollama/issues/9750)) about preferring to offload model layers over the KV cache when both will not fit, because keeping the cache on the GPU and spilling layers to CPU is usually faster than the reverse. For now this is runner behavior, not an environment variable, which is worth knowing before you go looking for an `OLLAMA_KV_OFFLOAD` that does not exist.

## How can I see which environment variables Ollama is using?

Start the server with `OLLAMA_DEBUG=1` and Ollama logs every variable and its resolved value at startup. On the macOS app, the same information appears in the server log.

This is the fastest way to confirm a variable actually took effect, which matters because the most common configuration bug is setting the variable in one environment and running the server in another. If `OLLAMA_DEBUG=1` shows the value you set, the setting is live. If it shows the default, you set it in the wrong place. Check the table above for the method that matches how your server runs.

## The case for not configuring any of this

Every variable on this page exists because Ollama ships conservative defaults and makes you opt in to the better ones. The context cache defaults to `f16` when `q8_0` would save memory for free. Flash attention is off until you turn it on. Browser requests from anything but a localhost origin are blocked until you set origins. None of these are wrong defaults, they are just cautious ones, and the result is a config file's worth of variables you have to learn before Ollama runs the way you want.

[ToolPiper](https://modelpiper.com) takes the other approach. It bundles the same llama.cpp engine and runs the same GGUF models, but it launches with `q8_0` KV cache quantization and flash attention on by default, serves CORS headers natively so there is no origins variable to set, and shows per-model memory directly so you can see what a model actually costs before you load it. The good defaults are the defaults. There is nothing to put in a shell profile and no server to restart.

It also connects to your existing Ollama instance as a provider, so the models you already pulled show up alongside ToolPiper's own engine. You do not have to choose. The honest limitation: ToolPiper is macOS only. If you run Ollama on Linux or Windows, the variables in this reference are how you get there, and they work.

Download ToolPiper at [modelpiper.com](https://modelpiper.com), or use the reference above and keep tuning Ollama directly.

_Part of our series on running Ollama on Mac. See also: [Ollama KV cache quantization](/blog/ollama-kv-cache-quantization), [the Ollama CORS fix](/blog/ollama-cors-fix-mac), and [running multiple Ollama models on Mac](/blog/ollama-multi-model-mac)._

## Steps

### 1. Choose where to set the variable

Match the method to how Ollama runs. Terminal server: your shell profile. macOS menu-bar app: `launchctl`. Linux: the systemd unit. Docker: a `-e` flag. Setting it in the wrong place is the number-one reason a variable looks ignored.

### 2. Set the variable

For the macOS app, run `launchctl setenv OLLAMA_KV_CACHE_TYPE q8_0`. For a terminal server, add `export OLLAMA_KV_CACHE_TYPE=q8_0` to `~/.zshrc`. Substitute whichever variable from the reference table you need.

### 3. Restart Ollama

Ollama reads environment variables once, at server startup. Quit the app fully and reopen it, or stop and restart `ollama serve`. Reloading a client or browser tab is not enough.

### 4. Verify it took effect

Start the server with `OLLAMA_DEBUG=1` and read the startup log. If it shows your value, the setting is live. If it shows the default, you set it in an environment the server does not see, so go back to step 1.

## FAQ

### What is the OLLAMA_KV_CACHE_TYPE variable?

`OLLAMA_KV_CACHE_TYPE` sets the quantization of Ollama's context (KV) cache. The default is `f16`. Setting it to `q8_0` roughly halves the cache's memory use with negligible quality loss. Going to `q4_0` quarters it with a small, measurable quality trade-off. Both quantized types only take effect with `OLLAMA_FLASH_ATTENTION=1`. It applies globally to every model the server loads.

### Do Ollama environment variables work in Docker?

Yes. Pass each one with a `-e` flag at run time, for example `docker run -e OLLAMA_KEEP_ALIVE=-1 -e OLLAMA_KV_CACHE_TYPE=q8_0 -p 11434:11434 ollama/ollama`. They behave identically to a native install because the container runs the same `ollama serve` process.

### Why don't my Ollama environment variables take effect?

Almost always because the variable is set in a different environment than the one the server runs in. The macOS app ignores your shell profile and needs `launchctl setenv`. A systemd service ignores your shell and needs an `Environment=` line in the unit. And every method requires a full server restart, since Ollama reads the variables only at startup.

### Where are Ollama environment variables stored on Windows?

In the Windows system environment variables. Search "edit the system environment variables" from the Start menu, add the variable under your user or system variables, then quit Ollama from the system tray and relaunch it so the server picks up the change.

### How do I set Ollama environment variables permanently?

Use a method that survives reboots. `launchctl setenv` on macOS resets on logout, so for a permanent macOS setting add it to a login script or LaunchAgent. On Linux, the systemd unit override is already permanent. In a shell, an `export` line in `~/.zshrc` or `~/.bashrc` persists across sessions.