Skip to content
Docs/Training Guide

Calibration and Fine-Tuning

Social Inference Engine uses per-signal-type temperature scaling to correct systematic overconfidence and underconfidence. Calibration state is updated online — each analyst correction improves the next inference.

How calibration works

LLMs are systematically miscalibrated: they output high confidence on common patterns regardless of actual accuracy. A model that consistently outputs confidence = 0.92 for churn_risk does not achieve 92% accuracy on that type.

Temperature scaling applies a learned scalar T to the raw logits before softmax. When T < 1.0, overconfident outputs are dampened. When T > 1.0, underconfident outputs are sharpened.

The scalar for each signal type is initialised from the seed dataset and updated online via gradient descent after each analyst correction. One update step takes 6–8 µs and requires no service restart.

Training workflow

1
Prepare the seed dataset
# The seed dataset is in training/seed_examples.jsonl
# Each line is: {"text": "...", "signal_type": "lead_opportunity", "platform": "reddit"}

# Validate the format
python training/validate_dataset.py --file training/seed_examples.jsonl

The seed dataset ships with 107 examples across all 10 signal types. Add your own examples to improve calibration for your specific use case.

2
Run initial calibration
python training/calibrate.py --epochs 5

Runs temperature scaling calibration on the seed dataset. Updates training/calibration_state.json. Takes ~30 seconds on 107 examples.

3
(Optional) Run fine-tuning for the non-frontier tier
# Prepare training data for OpenAI fine-tuning
python training/prepare_training_data.py

# Submit fine-tuning job
python training/fine_tune.py --base-model gpt-4o-mini

# Export the model ID to .env
echo "FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:…" >> .env

Fine-tuning targets: Macro F1 ≥ 0.82 · ECE ≤ 0.05 · False-action rate ≤ 0.08 · Abstention rate 5–15%. The job runs on OpenAI infrastructure and takes 20–60 minutes.

4
Submit analyst feedback to trigger online calibration
# Via API
curl -X POST http://localhost:8000/api/v1/signals/{id}/feedback \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"predicted_type": "feature_request_pattern", "true_type": "churn_risk"}'

Each feedback submission triggers one gradient-descent step on the ConfidenceCalibrator. The temperature scalar for the corrected type is updated in memory immediately and flushed to disk. No restart required.

Fine-tuning targets (non-frontier tier)

≥ 0.82
Macro F1
≤ 0.05
ECE
≤ 0.08
False-action rate
5–15%
Abstention rate

Current temperature scalars

Calibrated on 107 seed examples · State as of 2026-03-24

Signal TypeTemperature ScalarCalibratedSamples
lead_opportunity0.92Yes18
competitor_weakness0.88Yes14
influencer_amplification1.05Yes9
churn_risk0.79Yes21
misinformation_risk0.85Yes11
support_escalation0.83Yes15
product_confusion1.08Yes8
feature_request_pattern0.97Yes6
launch_moment0.94Yes3
trend_to_content1.12Yes2
Temperature = 1.0 means uncalibrated (mathematical identity). Values < 1.0 reduce overconfident outputs. Values > 1.0 sharpen underconfident outputs. Scalars reflect calibration on the 107-example seed dataset.