Skip to content
Docs/LLM Configuration

Configuring Your Inference Provider

Social Inference Engine supports four LLM providers: OpenAI, Anthropic, Ollama, and vLLM. All are configured via environment variables. The two-tier routing system applies regardless of provider.

Two-Tier Routing

TIER 1 — FRONTIER
churn_risk · misinformation_risk · support_escalation
Highest cost-of-error signal types. Routes to GPT-4o or Claude 3.5 Sonnet. Latency: 1.5–4 s.
TIER 2 — NON-FRONTIER
All 7 remaining signal types
Routes to fine-tuned GPT-4o mini or Ollama local model. 70–80% cost reduction. Latency: 0.4–12 s.

The routing decision is deterministic — set by the signal type, not by sampling or probability.

OpenAI

Hosted
Environment variables
OPENAI_API_KEY=sk-…required
FRONTIER_MODEL=gpt-4ooptional
NON_FRONTIER_MODEL=gpt-4o-minioptional
FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:…optional

Set FRONTIER_MODEL=gpt-4o for the three critical signal types. Set FINE_TUNED_MODEL_ID to a fine-tuned GPT-4o mini model ID to activate the non-frontier tier. If FINE_TUNED_MODEL_ID is not set, the system falls back to the base NON_FRONTIER_MODEL.

Anthropic

Hosted
Environment variables
ANTHROPIC_API_KEY=sk-ant-…required
ANTHROPIC_FRONTIER_MODEL=claude-3-5-sonnet-20241022optional
ANTHROPIC_NON_FRONTIER_MODEL=claude-3-5-haiku-20241022optional

Set OPENAI_API_KEY="" and ANTHROPIC_API_KEY="sk-ant-…" to route all inference to Anthropic. Two-tier routing is available: Sonnet for the frontier tier, Haiku for the non-frontier tier.

Ollama

Local · Zero egress
Environment variables
LOCAL_LLM_URL=http://localhost:11434required
LOCAL_LLM_MODEL=llama3.1:8brequired

With LOCAL_LLM_URL and LOCAL_LLM_MODEL set, all inference routes to Ollama. No observation text ever reaches an external network. Performance: 3–12 s per signal on Apple M-series hardware with llama3.1:8b. Run `ollama pull llama3.1:8b` before starting the API.

vLLM

Self-hosted
Environment variables
VLLM_ENDPOINT=http://your-vllm-host:8000/v1required
VLLM_MODEL=Meta-Llama-3.1-8B-Instructoptional

vLLM exposes an OpenAI-compatible API. Set VLLM_ENDPOINT to the base URL of your vLLM server. Ideal for high-throughput self-hosted deployments on A100/H100 GPUs. Latency: 0.2–0.8 s per signal at full GPU utilisation.

Fine-tuning the non-frontier tier

The training directory contains a full fine-tuning pipeline for GPT-4o mini. Running the fine-tuning pipeline produces a model ID that you set as FINE_TUNED_MODEL_ID.

# Prepare training data
python training/prepare_training_data.py

# Run fine-tuning job (requires OPENAI_API_KEY)
python training/fine_tune.py --base-model gpt-4o-mini

# The script prints: FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:org:...
# Add that value to your .env
Full training guide →