Hyperparameter Tuning¶
What This Is¶
Hyperparameter tuning is a controlled search over model settings without leaking information across the validation boundary.
The deeper point is that tuning is not about finding the most extreme settings. It is about finding the smallest change that gives a repeatable improvement.
When You Use It¶
- comparing a few candidate settings honestly
- tuning regularization, tree depth, or similar controls
- improving a baseline without changing the whole workflow
Tooling¶
PipelineGridSearchCVRandomizedSearchCVvalidation_curveParameterGridHalvingGridSearchCVHalvingRandomSearchCVStandardScaler
Library Notes¶
Pipelinekeeps preprocessing tied to the model so each fold stays honest.GridSearchCVis best when the search space is small and you want to inspect every candidate.RandomizedSearchCVis better when the space is larger or you want a fast first pass.validation_curveis useful when you want to inspect one parameter at a time instead of tuning several knobs at once.ParameterGridhelps you reason about the search space before the run starts.HalvingGridSearchCVandHalvingRandomSearchCVspend fewer resources on weak candidates and are useful when the full search would be too expensive.StandardScalershould usually live inside the pipeline for linear and distance-based models.
What To Tune First¶
Start with the parameters that control capacity:
Cfor linear and margin-based modelsmax_depth,min_samples_leaf, or similar controls for tree models- regularization or shrinkage knobs before secondary preprocessing choices
If the first pass is inconclusive, add one interacting parameter only after you can explain why it belongs in the search.
Honest Tuning Protocol¶
Treat tuning as a controlled decision process:
- lock the split or CV design first
- choose one primary metric
- define a budget for candidates, not an unlimited search
- search only inside the training boundary
- compare the tuned winner against the untuned baseline
- evaluate once on the locked holdout after selection
If the tuned model cannot beat the untuned baseline honestly, the lesson is often about the representation or the split, not the grid size.
Minimal Example¶
from sklearn.model_selection import GridSearchCV
search = GridSearchCV(model, {"C": [0.1, 1.0, 10.0]}, cv=cv, scoring="roc_auc")
Worked Pattern¶
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
("scale", StandardScaler()),
("model", LogisticRegression(max_iter=1000)),
])
search = GridSearchCV(
pipeline,
{"model__C": [0.1, 1.0, 10.0]},
cv=cv,
scoring="average_precision",
return_train_score=True,
)
search.fit(X_train, y_train)
The important part is not the exact grid. It is that preprocessing stays inside the pipeline and the search happens only inside the training boundary.
What To Read After Fitting¶
Read these outputs before you celebrate a winner:
best_params_best_score_best_estimator_cv_results_
cv_results_ matters because it shows the whole candidate table, not just the winner. That makes it easier to see whether the gain is broad or just a one-point spike.
One-Parameter Check¶
from sklearn.model_selection import validation_curve
train_scores, valid_scores = validation_curve(
pipeline,
X_train,
y_train,
param_name="model__C",
param_range=[0.01, 0.1, 1.0, 10.0],
cv=cv,
scoring="average_precision",
)
Use this when you want to answer one question first:
- is the model under-regularized
- is the model over-regularized
- is the gain broad enough to matter
If the curve is flat, more tuning may not be the right next move.
Search Helper¶
from sklearn.model_selection import ParameterGrid
list(ParameterGrid({"model__C": [0.1, 1.0], "model__penalty": ["l2"]}))
This is useful when you want to sanity-check the search size before spending time on the run.
Search Space Design Under Budget¶
A good search space is narrow enough to teach you something.
- use log-scale sweeps for regularization and learning-rate style parameters
- tune the one or two capacity controls most likely to matter before secondary knobs
- use coarse-to-fine search instead of a huge first grid
- keep a clear reason for every parameter in the search
Bad search spaces usually share one symptom: the candidate table is large, but none of the choices would be easy to defend to a teammate.
One-Standard-Error Rule¶
If several candidates are close, prefer the simplest candidate whose score is within one standard error of the best mean.
Practical version:
best_mean = table["mean_score"].max()
best_sem = table.loc[table["mean_score"].idxmax(), "sem_score"]
safe = table[table["mean_score"] >= best_mean - best_sem]
Then choose the simplest row inside safe, not automatically the row with the very top mean. This protects you from over-reading small tuning differences.
Split And Scoring Must Match The Task¶
Search quality depends on the scorer and the splitter:
- imbalanced queue: optimize
average_precisionor a threshold-aware metric, not plain accuracy - grouped data: use
GroupKFoldorStratifiedGroupKFold - time-aware data: use an ordered splitter, not shuffled CV
If the search uses the wrong split or the wrong metric, the best parameters are only best for the wrong problem.
What To Watch For¶
- a grid that is so wide it becomes hard to interpret
- a best setting that barely beats the default
- a tuning run that changes the validation story only by chance
- a pipeline that accidentally leaks preprocessing information
- a large train-validation gap hidden behind one average score
- a search that takes longer to explain than the gain is worth
The important signal is not "did the score move?" It is "did the score move in a way I can defend?"
When To Use Halving Search¶
Use halving search when:
- the grid is large enough that full search is expensive
- you want to eliminate weak candidates early
- you can accept a more aggressive search strategy
Use it carefully:
- keep the split fixed
- compare it against a smaller ordinary search first
- check whether the winner is stable enough to justify the shortcut
What To Try¶
- tune
Cfor logistic regression - tune
max_depthormin_samples_leaffor a tree model - compare a small grid with a randomized search on the same metric
- inspect one parameter with
validation_curvebefore tuning two at once - use
ParameterGridto reason about the search space before the run - try halving search only when the full search would be too slow
Failure Pattern¶
Scaling or imputing on the full dataset before the search begins. Preprocessing must stay inside the pipeline so each fold is treated honestly.
Another failure pattern is making the grid too wide. A search that is too big becomes a time sink and often rewards luck more than understanding.
Another failure pattern is tuning several knobs at once before you know which one matters. If you cannot explain why a parameter belongs in the search, it probably should not be there yet.
Another failure pattern is trusting the best score without checking the spread, the training score, and the candidate table.
Another common counterexample is a wide search where one extreme candidate wins by 0.002 on validation but loses the weak slice, inflates training score, or falls outside the one-standard-error safety zone. That is not a robust win.
Inspection Habits¶
- compare the best score with the baseline score, not just the neighboring candidates
- check whether the train score rises much faster than the validation score
- inspect whether one parameter dominates the result
- prefer the smallest setting that gives a repeatable gain
- read the whole candidate table before announcing a winner
If a smaller setting is nearly as good as the best one, the smaller setting is often the more defensible choice.
Practice¶
- Tune one hyperparameter grid for logistic regression.
- Tune one small tree-based grid.
- Explain why the search happens only inside the training boundary.
- Name one setting you would not tune on the first pass.
- Explain what a small but consistent gain means compared with a one-off large jump.
- Describe how you would decide whether
RandomizedSearchCVis enough. - State what you would lock before the second tuning pass.
- Explain when a default setting is already good enough.
- Use
validation_curveto decide whether one parameter is worth tuning further. - Explain what
best_estimator_andcv_results_each tell you after a search.
Runnable Example¶
Open the matching example in AI Academy and run it from the platform.
Run the same idea in the browser:
Inspect the best parameter choice and the validation metrics after the search finishes.
Longer Connection¶
Continue with scikit-learn Validation and Tuning for a fuller tuning and calibration workflow.