Calibration and Fine-Tuning
Social Inference Engine uses per-signal-type temperature scaling to correct systematic overconfidence and underconfidence. Calibration state is updated online — each analyst correction improves the next inference.
How calibration works
LLMs are systematically miscalibrated: they output high confidence on common patterns regardless of actual accuracy. A model that consistently outputs confidence = 0.92 for churn_risk does not achieve 92% accuracy on that type.
Temperature scaling applies a learned scalar T to the raw logits before softmax. When T < 1.0, overconfident outputs are dampened. When T > 1.0, underconfident outputs are sharpened.
The scalar for each signal type is initialised from the seed dataset and updated online via gradient descent after each analyst correction. One update step takes 6–8 µs and requires no service restart.
Training workflow
# The seed dataset is in training/seed_examples.jsonl
# Each line is: {"text": "...", "signal_type": "lead_opportunity", "platform": "reddit"}
# Validate the format
python training/validate_dataset.py --file training/seed_examples.jsonlThe seed dataset ships with 107 examples across all 10 signal types. Add your own examples to improve calibration for your specific use case.
python training/calibrate.py --epochs 5
Runs temperature scaling calibration on the seed dataset. Updates training/calibration_state.json. Takes ~30 seconds on 107 examples.
# Prepare training data for OpenAI fine-tuning python training/prepare_training_data.py # Submit fine-tuning job python training/fine_tune.py --base-model gpt-4o-mini # Export the model ID to .env echo "FINE_TUNED_MODEL_ID=ft:gpt-4o-mini:…" >> .env
Fine-tuning targets: Macro F1 ≥ 0.82 · ECE ≤ 0.05 · False-action rate ≤ 0.08 · Abstention rate 5–15%. The job runs on OpenAI infrastructure and takes 20–60 minutes.
# Via API
curl -X POST http://localhost:8000/api/v1/signals/{id}/feedback \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"predicted_type": "feature_request_pattern", "true_type": "churn_risk"}'Each feedback submission triggers one gradient-descent step on the ConfidenceCalibrator. The temperature scalar for the corrected type is updated in memory immediately and flushed to disk. No restart required.
Fine-tuning targets (non-frontier tier)
Current temperature scalars
Calibrated on 107 seed examples · State as of 2026-03-24
| Signal Type | Temperature Scalar | Calibrated | Samples |
|---|---|---|---|
| lead_opportunity | 0.92 | Yes | 18 |
| competitor_weakness | 0.88 | Yes | 14 |
| influencer_amplification | 1.05 | Yes | 9 |
| churn_risk | 0.79 | Yes | 21 |
| misinformation_risk | 0.85 | Yes | 11 |
| support_escalation | 0.83 | Yes | 15 |
| product_confusion | 1.08 | Yes | 8 |
| feature_request_pattern | 0.97 | Yes | 6 |
| launch_moment | 0.94 | Yes | 3 |
| trend_to_content | 1.12 | Yes | 2 |