Configuring Your Inference Provider

Social Inference Engine supports four LLM providers: OpenAI, Anthropic, Ollama, and vLLM. All are configured via environment variables. The two-tier routing system applies regardless of provider.

Two-Tier Routing

TIER 1 — FRONTIER

churn_risk · misinformation_risk · support_escalation

Highest cost-of-error signal types. Routes to GPT-4o or Claude 3.5 Sonnet. Latency: 1.5–4 s.

TIER 2 — NON-FRONTIER

All 7 remaining signal types

Routes to fine-tuned GPT-4o mini or Ollama local model. 70–80% cost reduction. Latency: 0.4–12 s.

The routing decision is deterministic — set by the signal type, not by sampling or probability.

OpenAI

Hosted

Environment variables

OPENAI_API_KEY=sk-…required

FRONTIER_MODEL=gpt-4ooptional

NON_FRONTIER_MODEL=gpt-4o-minioptional

FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:…optional

Set FRONTIER_MODEL=gpt-4o for the three critical signal types. Set FINE_TUNED_MODEL_ID to a fine-tuned GPT-4o mini model ID to activate the non-frontier tier. If FINE_TUNED_MODEL_ID is not set, the system falls back to the base NON_FRONTIER_MODEL.

Anthropic

Hosted

Environment variables

ANTHROPIC_API_KEY=sk-ant-…required

ANTHROPIC_FRONTIER_MODEL=claude-3-5-sonnet-20241022optional

ANTHROPIC_NON_FRONTIER_MODEL=claude-3-5-haiku-20241022optional

Set OPENAI_API_KEY="" and ANTHROPIC_API_KEY="sk-ant-…" to route all inference to Anthropic. Two-tier routing is available: Sonnet for the frontier tier, Haiku for the non-frontier tier.

Ollama

Local · Zero egress

Environment variables

LOCAL_LLM_URL=http://localhost:11434required

LOCAL_LLM_MODEL=llama3.1:8brequired

With LOCAL_LLM_URL and LOCAL_LLM_MODEL set, all inference routes to Ollama. No observation text ever reaches an external network. Performance: 3–12 s per signal on Apple M-series hardware with llama3.1:8b. Run `ollama pull llama3.1:8b` before starting the API.

vLLM

Self-hosted

Environment variables

VLLM_ENDPOINT=http://your-vllm-host:8000/v1required

VLLM_MODEL=Meta-Llama-3.1-8B-Instructoptional

vLLM exposes an OpenAI-compatible API. Set VLLM_ENDPOINT to the base URL of your vLLM server. Ideal for high-throughput self-hosted deployments on A100/H100 GPUs. Latency: 0.2–0.8 s per signal at full GPU utilisation.

Fine-tuning the non-frontier tier

The training directory contains a full fine-tuning pipeline for GPT-4o mini. Running the fine-tuning pipeline produces a model ID that you set as FINE_TUNED_MODEL_ID.

# Prepare training data
python training/prepare_training_data.py

# Run fine-tuning job (requires OPENAI_API_KEY)
python training/fine_tune.py --base-model gpt-4o-mini

# The script prints: FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:org:...
# Add that value to your .env

Full training guide →

Full LLM Deployment Guide on GitHub →