Which Method Should I Use?¶
This guide helps you choose the right class from online-cp for your problem.
If you're new to conformal prediction, start with the tutorial notebook first.
What output do you need?¶
flowchart TD
A[What do you need?] --> B{Response type?}
B -->|Continuous y| C{What kind of output?}
B -->|Discrete labels| D{What kind of output?}
C -->|Prediction interval| E[**Regressors**<br>ConformalRidgeRegressor<br>ConformalNearestNeighboursRegressor<br>KernelConformalRidgeRegressor<br>ConformalLassoRegressor]
C -->|Full predictive distribution| F[**CPS**<br>RidgePredictionMachine<br>KernelRidgePredictionMachine<br>NearestNeighboursPredictionMachine]
D -->|Prediction set with coverage guarantee| G[**Classifiers**<br>ConformalNearestNeighboursClassifier<br>ConformalSupportVectorMachine]
D -->|Calibrated class probabilities| H[**Venn Predictors**<br>VennAbersPredictor<br>NearestNeighboursVennPredictor]
style E fill:#e8f5e9
style F fill:#e3f2fd
style G fill:#fff3e0
style H fill:#fce4ec
Rule of thumb
- Need a yes/no coverage guarantee? → Regressors or Classifiers
- Need to make decisions under uncertainty? → CPS (gives you the full distribution to optimise over)
- Need calibrated probabilities for downstream scoring? → Venn predictors
Regressors¶
All regressors produce ConformalPredictionInterval objects with guaranteed marginal coverage.
| Class | Complexity (per step) | Best when | Notes |
|---|---|---|---|
ConformalRidgeRegressor |
\(O(p^2)\) | Linear signal, moderate \(p\) | Exact; fastest for \(p \ll n\). Supports studentised residuals. |
KernelConformalRidgeRegressor |
\(O(n^2)\) | Nonlinear signal, kernel trick | Exact; any kernel from online_cp.kernels or sklearn. |
ConformalNearestNeighboursRegressor |
\(O(n)\) | Non-parametric, local patterns | Good default for tabular data. Custom distance metrics. |
ConformalLassoRegressor |
\(O(np)\) | High-dimensional sparse signals | Homotopy-based; supports elastic net (\(\rho > 0\)). |
Mondrian variants
Wrap any regressor in MondrianConformalRegressor for group-conditional coverage when you have a known categorical covariate (e.g., site, sensor type).
Quick start¶
from online_cp import ConformalRidgeRegressor
# Linear, fast, good default
model = ConformalRidgeRegressor(a=1.0) # a = ridge parameter
# Nonlinear with Gaussian kernel
from online_cp import KernelConformalRidgeRegressor, GaussianKernel
model = KernelConformalRidgeRegressor(kernel=GaussianKernel(sigma=1.0), a=0.01)
# K-nearest neighbours
from online_cp import ConformalNearestNeighboursRegressor
model = ConformalNearestNeighboursRegressor(k=5)
# High-dimensional sparse
from online_cp import ConformalLassoRegressor
model = ConformalLassoRegressor(lam=0.1, autotune=True)
Classifiers¶
All classifiers produce ConformalPredictionSet objects — sets of labels guaranteed to contain the true label with probability \(\geq 1 - \varepsilon\).
| Class | Complexity (per step) | Best when | Notes |
|---|---|---|---|
ConformalNearestNeighboursClassifier |
\(O(n)\) | General purpose, any metric space | Default choice. Supports custom distance functions. |
ConformalSupportVectorMachine |
\(O(n^2)\) in worst case | Kernel-based decision boundary | Uses SMO solver; powerful but slower. |
Mondrian variants
Wrap in MondrianConformalClassifier for label-conditional coverage (validity within each class).
Quick start¶
from online_cp import ConformalNearestNeighboursClassifier
# Good default
model = ConformalNearestNeighboursClassifier(k=3)
# With SVM (kernel-based)
from online_cp import ConformalSupportVectorMachine, GaussianKernel
model = ConformalSupportVectorMachine(kernel=GaussianKernel(sigma=1.0), C=1.0)
Conformal Predictive Systems (CPS)¶
CPS outputs a full conformal predictive distribution — a valid distribution function over \(\mathbb{R}\) that you can query at any quantile, use for decision-making, or extract prediction intervals from.
| Class | Complexity (per step) | Best when | Notes |
|---|---|---|---|
RidgePredictionMachine |
\(O(p^2)\) | Linear signal | Same as Ridge regressor but outputs CPD |
KernelRidgePredictionMachine |
\(O(n^2)\) | Nonlinear signal | Any kernel |
NearestNeighboursPredictionMachine |
\(O(n)\) | Non-parametric | Based on k-NN residuals |
DempsterHillConformalPredictiveSystem |
\(O(n)\) | No features (time series) | Label-only; Dempster-Hill construction |
CPS vs Regressor
A CPS gives you more information — you get the full distribution, from which you can derive intervals at any level. Use a CPS when you need to:
- Extract prediction sets at multiple \(\varepsilon\) levels simultaneously
- Make optimal decisions under a utility function
- Visualise the predictive distribution
- Score predictions with proper scoring rules (CRPS, log score)
If you only need an interval at a fixed \(\varepsilon\), a Regressor is simpler and equally valid.
Quick start¶
from online_cp import RidgePredictionMachine
cps = RidgePredictionMachine(a=1.0)
cps.learn_initial_training_set(X_train, y_train)
cpd = cps.predict(x_new) # Returns a CPD object
interval = cpd.predict_set(tau=0.5, epsilon=0.1) # Extract interval
Venn Predictors¶
Venn predictors output calibrated multi-probability predictions — a family of probability distributions over the label space, one for each hypothesis about the true label.
| Class | Scorer | Best when | Notes |
|---|---|---|---|
VennAbersPredictor |
"ridge" |
Linear scoring, fast | Default for binary/multiclass |
VennAbersPredictor |
"kernel_ridge" |
Nonlinear scoring | Requires a kernel |
VennAbersPredictor |
"knn" |
Non-parametric scoring | Uses k-NN distances |
VennAbersPredictor |
"svm" |
Kernel SVM scoring | Powerful but slower |
NearestNeighboursVennPredictor |
— | Simple k-NN taxonomy | Lightweight alternative |
Venn vs Classifier
- A conformal classifier gives you a set of labels with a coverage guarantee.
- A Venn predictor gives you probabilities for each label, with a calibration guarantee.
Use Venn when you need probability estimates (e.g., for ranking, expected utility, or downstream models). Use a classifier when you need a hard set-valued prediction with coverage.
Quick start¶
from online_cp import VennAbersPredictor
# Default: ridge scoring, full (transductive) mode
vap = VennAbersPredictor(scorer="ridge", a=1.0)
vap.learn_initial_training_set(X_train, y_train)
pred = vap.predict(x_new) # Returns VennPrediction
print(pred.p0, pred.p1) # P(y=1) under each hypothesis (binary)
print(pred.point) # Averaged point prediction
Martingales & Change Detection¶
Conformal test martingales detect violations of exchangeability (distribution changes) online. The martingale value \(M_n\) grows when the data distribution deviates from the calibration distribution.
Which martingale?¶
| Class | Best when | Key idea |
|---|---|---|
PluginMartingale |
General purpose | Wraps any betting strategy with cautious-start mixing |
SimpleJumper |
Unknown alternative | Jumps between epsilon-experts; robust to different alternatives |
CompositeJumper |
Unknown alternative | Like SimpleJumper but also mixes over jump rates |
SleeperStayer |
Sudden change | Maintains sleeping/awake copies; resets after detection |
SleeperDrifter |
Gradual drift | Like SleeperStayer but forgets old data geometrically |
SimpleMixtureMartingale |
Conservative default | Uniform mixture over strategies; hard to beat in worst case |
Which betting strategy?¶
Used inside PluginMartingale (or standalone):
| Strategy | Best when | Notes |
|---|---|---|
GaussianKDE |
General purpose | Default. Adaptive bandwidth, window option. |
BetaKernel |
General purpose | Better near boundaries of [0,1] |
BetaMLE |
Parametric alternative | Fits Beta distribution by MLE |
BetaMoments |
Fast parametric | Method-of-moments Beta fit |
ParticleFilterStrategy |
Non-stationary alternative | Adapts to changing distribution |
ExpertAggregationStrategy |
Want to combine experts | Exponential weights over a set of strategies |
FixedStrategy |
Known alternative | Supply your own pdf/cdf |
PiecewiseConstantBetting |
Simple bin-based | Histogram-like density on [0,1] |
Which wrapper?¶
Wrappers convert a raw martingale into a change-point detector with a stopping rule:
| Wrapper | Detects | Threshold | Notes |
|---|---|---|---|
VilleWrapper |
Change anywhere | \(M_n \geq \lambda\) | Classical Ville's inequality. Simple, anytime-valid. |
CUSUMWrapper |
Change after unknown time | Page's CUSUM on \(\log M\) | Lower detection delay post-change. |
ShiryaevRobertsWrapper |
Change after unknown time | SR statistic | Minimax optimal average detection delay (asymptotically). |
Default recommendation
Start with PluginMartingale(GaussianKDE) wrapped in VilleWrapper(threshold=20). This is robust, simple, and corresponds to a Bayes factor of 20 against exchangeability.
Quick start¶
from online_cp import (
PluginMartingale, GaussianKDE, SimpleJumper,
VilleWrapper, CUSUMWrapper
)
# Default: plugin with Gaussian KDE
mart = PluginMartingale(betting_strategy=GaussianKDE, min_sample_size=50)
# Or use a jumper (no tuning needed)
mart = SimpleJumper(J=0.01)
# Wrap for detection
detector = VilleWrapper(mart, threshold=20)
# Feed p-values one at a time
for p in p_values:
detector.update(p)
if detector.M >= detector.threshold:
print("Change detected!")
break
Mondrian Conformal Prediction¶
Use Mondrian CP when you need validity guarantees within subgroups, not just overall.
Label-conditional (classification)¶
The most common case: guarantee coverage per class — \(P(y \in \Gamma(x) \mid y = c) \geq 1 - \varepsilon\) for all \(c\) (ALRW2 §4.6.7).
from online_cp import ConformalNearestNeighboursClassifier
from online_cp.mondrian import MondrianConformalClassifier
model = MondrianConformalClassifier(
base_model=ConformalNearestNeighboursClassifier(k=5),
category_fn="label",
)
model.learn_initial_training_set(X, y)
Gamma = model.predict(x_new, epsilon=0.1)
Object-conditional (regression or classification)¶
When you have a categorical covariate (site, sensor, group) and want group-conditional validity:
from online_cp import ConformalRidgeRegressor
from online_cp.mondrian import MondrianConformalRegressor
base = ConformalRidgeRegressor(a=1.0)
model = MondrianConformalRegressor(base, category_fn=lambda x: int(x[0] > 0))
model.learn_initial_training_set(X, y)
interval = model.predict(x_new, epsilon=0.1)
General taxonomy¶
For any taxonomy \(\kappa(x, y) \to \text{category}\) depending on both features and label:
MondrianConformalClassifier(
base_model=ConformalNearestNeighboursClassifier(k=5),
category_fn=lambda x, y: (y, int(x[0] > 0)), # cross label × feature group
)
When NOT to use Mondrian
- If your categories are very small → insufficient calibration data per group
- If category membership is unknown at test time (not applicable to label-conditional)
- If you want marginal (overall) coverage only → standard CP is simpler and tighter
Summary Decision Table¶
| I want... | Use | Class |
|---|---|---|
| Prediction interval (linear) | Regressor | ConformalRidgeRegressor |
| Prediction interval (nonlinear) | Regressor | KernelConformalRidgeRegressor or ConformalNearestNeighboursRegressor |
| Prediction interval (sparse) | Regressor | ConformalLassoRegressor |
| Prediction set (classification) | Classifier | ConformalNearestNeighboursClassifier |
| Full predictive distribution | CPS | RidgePredictionMachine |
| Calibrated probabilities | Venn | VennAbersPredictor |
| Change-point detection | Martingale | PluginMartingale + VilleWrapper |
| Group-conditional coverage | Mondrian | MondrianConformalRegressor / MondrianConformalClassifier |
| Label-conditional coverage | Mondrian | MondrianConformalClassifier(category_fn="label") |