Data drift is a change in the statistical distribution of a model's input features over time, while concept drift is a change in the relationship between inputs and the target. Both degrade a deployed model's accuracy and must be monitored in production.
What is Data Drift?
Data drift is a change over time in the statistical distribution of the input data a model sees in production compared with the data it was trained on. The features still mean the same thing, but their distribution shifts: a new user segment appears, a sensor recalibrates, prices inflate, or seasonal behavior changes. Because the model learned patterns on the old distribution, its predictions can become less reliable even though the code and weights are unchanged.
Data drift is distinct from concept drift. With data drift the input distribution P(X) changes; with concept drift the conditional relationship P(Y given X) changes, meaning the same inputs now map to different correct answers. A fraud model can suffer concept drift when fraudsters change tactics, so identical transaction features that were once benign become fraudulent.
- Data drift: the distribution of inputs P(X) shifts over time.
- Concept drift: the input-to-target relationship P(Y given X) shifts over time.
- Both can occur together and both degrade accuracy without any change to the model.
Types of drift and why they happen
Covariate shift is the most common form of data drift, where feature distributions move but the underlying labeling rule stays fixed. Label shift (prior probability shift) is a change in the distribution of the target class frequencies. Concept drift can be sudden, gradual, incremental, or recurring (for example seasonal patterns that return each year).
A related but separate problem is training-serving skew, a discrepancy between how features are computed in the training and serving pipelines. Skew is not time-based drift; it is an engineering mismatch, such as a feature transformed one way in the training pipeline and another way in the live service. Google's Rules of Machine Learning notes that scoring the same example in both pipelines should give exactly the same result, so a difference probably indicates an engineering error. It produces drift-like symptoms but is fixed in code, not by retraining.
- Covariate shift: P(X) changes, labeling rule unchanged.
- Label shift: the class prior P(Y) changes.
- Training-serving skew: features computed differently in the training vs serving pipelines, an engineering bug, not temporal drift.
How to detect drift
Detection compares a reference window (often the training or a known-good period) against a current production window. For numeric features, a two-sample test such as the Kolmogorov-Smirnov test or a distance like Wasserstein flags distribution change. For categorical features, the chi-squared test or population stability index (PSI) is common. Concept drift is detected by monitoring model quality metrics over time when ground-truth labels eventually arrive, or by proxy signals like prediction confidence when labels are delayed.
A widely used summary metric is the population stability index, which compares the binned distribution of a variable between reference and current sets. A common rule of thumb treats PSI below 0.1 as no significant shift, 0.1 to 0.25 as moderate, and above 0.25 as major.
- Kolmogorov-Smirnov and Wasserstein distance for numeric feature drift.
- Chi-squared test and population stability index for categorical drift.
- Track accuracy, AUC, or error directly once labels arrive to catch concept drift.
Responding to drift
Not every detected drift requires action. The decision should hinge on whether model performance has actually degraded for the business metric that matters. Cheap responses include alerting and investigating root cause; heavier responses include retraining on recent data, adding the new segment to the training set, or rolling back to a safer model. Scheduled retraining and continuous evaluation pipelines turn drift handling into routine operations rather than emergencies.
Monitoring should cover inputs, predictions, and outcomes. Google's production ML guidance recommends monitoring the freshness and distribution of input features, the distribution of predictions, and downstream metrics, because drift can appear in any of these layers before it shows up as a visible accuracy drop.
- Alert on drift, but gate retraining on real performance loss to avoid churn.
- Retrain on recent labeled data or expand the training set to cover new segments.
- Monitor inputs, predictions, and outcomes together, since drift can surface in any layer.
Key takeaways
- Data drift changes input distributions P(X); concept drift changes the input-to-target relationship P(Y given X).
- Both degrade a deployed model over time even though its code and weights never change.
- Detect drift by comparing a reference window to a current window using KS tests, chi-squared, PSI, or Wasserstein distance.
- Training-serving skew looks like drift but is an engineering mismatch fixed in code, not by retraining.
- Act on drift based on real performance loss, and monitor inputs, predictions, and outcomes together.
Frequently asked questions
Related terms
Related reading
Sources
Put the idea into practice
MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.
Try MemX Free