Ollama Provider¶

Ollama is a local model server with an OpenAI-compatible chat completion endpoint. The Joch adapter targets the /api/chat and /v1/chat/completions paths.

Authentication¶

None by default. Bind to localhost and protect via process / network policy.

Configuration¶

apiVersion: model.joch.dev/v1alpha1
kind: Model
metadata: { name: llama-3-3-70b-local }
spec:
  provider: ollama
  model: llama-3.3-70b
  endpoint:
    type: local
    baseUrl: http://localhost:11434
  capabilities:
    text: true
    toolCalling: true
    structuredOutput: false
    streaming: true
  routing:
    regions: [on-prem]
    fallbackPolicy: on_error

Capability vector¶

Ollama capabilities depend on the underlying model. Set capabilities per registered model based on what the served weights support.

Tool calling¶

Many Ollama models support tool calls via JSON schemas; some do not. The adapter declares per-model capability and the router uses it for fall-forward.

Region / residency¶

By definition local. Use Ollama for on-prem and EU residency-bound workloads.

Cost reporting¶

Token counts are tracked but cost is reported as 0 unless you configure per-model pricing to reflect internal chargeback.

Known limits¶

Ollama performance varies dramatically by model and host hardware.
Multimodal support depends on the served model.