Draft:Purged cross-validation

Review waiting, please be patient.

This may take 5 weeks or more, since drafts are reviewed in no specific order. There are 369 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Purged cross-validation (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 6 days ago by Cpcv2026 (talk: D · +) · Last edited 6 days ago by Cpcv2026

Submission declined on 9 June 2025 by WeWake (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by WeWake 7 days ago. Last edited by Cpcv2026 6 days ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Submission declined on 8 June 2025 by Rambley (talk).

This submission reads more like an essay than an encyclopedia article. Submissions should summarise information in secondary, reliable sources and not contain opinions or original research. Please write about the topic from a neutral point of view in an encyclopedic manner.

Declined by Rambley 7 days ago.

Comment: The references are not properly linked and some don't even exist (for example, ref [8]), this makes me suspcious that some, if not all, of the article content may be LLM generated. Please fix the references to the link to the articles. Without appropriate sourcing from reliable publications, the topic cannot be accepted. WeWake (talk) 01:01, 9 June 2025 (UTC)

Comment: Reads more like an essay or a tutorial than an encyclopedia article, especially with the "Python Notebooks with Examples" at the end. Rambley (talk) 09:27, 8 June 2025 (UTC)

Purged cross-validation is a variant of k-fold cross-validation designed to prevent look-ahead bias in time series and other structured data. It is primarily used in financial machine learning to ensure the independence of training and testing samples when labels depend on future events. It provides an alternative to conventional cross-validation and walk-forward backtesting methods, which often yield overly optimistic performance estimates due to information leakage and overfitting.^[1]^[2]

Motivation

Standard cross-validation assumes that observations are independently and identically distributed (IID), which often does not hold in time series or financial datasets. If the label of a test sample overlaps in time with the features or labels in the training set, the result may be data leakage and overfitting. Purged cross-validation addresses this issue by removing overlapping observations and, optionally, adding a temporal buffer ("embargo") around the test set to further reduce the risk of leakage.^[3]^[2][1]^[4]

The figure below illustrates standard 5 Fold Cross-Validation^[5]

Purging

Purging removes from the training set any observation whose timestamp falls within the time range of formation of a label in the test set. This can be the case for train set observations before and after the test set. Their removal ensures that the algorithm cannot learn during train time information that will be used to assess the performance of the algorithm. See the figure below for an illustration of purging.^[6]

Embargoing

Embargoing addresses a more subtle form of leakage: even if an observation does not directly overlap the test set, it may still be affected by test events due to market reaction lag or downstream dependencies. To guard against this, a percentage-based embargo is imposed after each test fold. For example, with a 5% embargo and 1000 observations, the 50 observations following each test fold are excluded from training.

Unlike purging, embargoing can only occur after the test set. The figure below illustrates the application of embargo:^[6]

Applications

Purged and embargoed cross-validation has been useful in:

Backtesting of trading strategies^[1]^[7]
Validation of classifiers on labeled event-driven returns^[4]^[8]
Any machine learning task with overlapping label horizons^[6]^[3]

Example

To illustrate the effect of purging and embargoing, consider the figures below. Both diagrams show the structure of 5-fold cross-validation over a 20-day period. In each row, blue squares indicate training samples and red squares denote test samples. Each label is defined based on the value of the next two observations, hence creating an overlap. If this overlap is left untreated, test set information leaks into the train set.

Standard K-Fold Cross-Validation: test samples are randomly partitioned with no attention to label overlap or time ordering. This can lead to contamination of the training set with future information.

The second figure applies the Purged CV procedure. Notice how purging removes overlapping observations from the training set and the embargo widens the gap between test and training data. This approach ensures that the evaluation more closely resembles a true out-of-sample test and reduces the risk of backtest overfitting.

Purged K-Fold Cross-Validation: training samples that overlap with the test label horizon are removed. Embargoing is applied to prevent leakage from immediately adjacent samples.

Combinatorial Purged Cross-Validation

Walk-forward backtesting analysis, another common cross-validation technique in finance, preserves temporal order but evaluates the model on a single sequence of test sets. This leads to high variance in performance estimation, as results are contingent on a specific historical path.^[1]

Combinatorial Purged Cross-Validation (CPCV) addresses this limitation by systematically constructing multiple train-test splits, purging overlapping samples, and enforcing an embargo period to prevent information leakage. The result is a distribution of out-of-sample performance estimates, enabling robust statistical inference and more realistic assessment of a model's predictive power.^[6]

Methodology

CPCV divides a time-series dataset into N sequential, non-overlapping groups. These groups preserve the temporal order of observations. Then, all combinations of k groups (where k < N) are selected as test sets, with the remaining N − k groups used for training. For each combination, the model is trained and evaluated under strict controls to prevent leakage.^[6]

To eliminate potential contamination between training and test sets, CPCV introduces two additional mechanisms:

Purging: Any training observations whose label horizon overlaps with the test period are excluded. This ensures that future information does not influence model training.
Embargoing: After the end of each test period, a fixed number of observations (typically a small percentage) are removed from the training set. This prevents leakage due to delayed market reactions or auto-correlated features.

Each data point appears in multiple test sets across different combinations. Because test groups are drawn combinatorially, this process produces multiple backtest "paths," each of which simulates a plausible market scenario. From these paths, practitioners can compute a distribution of performance statistics such as the Sharpe ratio, drawdown, or classification accuracy.

Formal definition

Let N be the number of sequential groups into which the dataset is divided, and let k be the number of groups selected as the test set for each split. Then:

The number of unique train-test combinations is given by the binomial coefficient:

{\binom {N}{k}}

Each observation is used in $k$ test sets and contributes to $\varphi [N,k]$ unique backtest paths:

\varphi [N,k]={\frac {k}{N}}{\binom {N}{k}}

This yields a distribution of performance metrics rather than a single point estimate, making it possible to apply Monte Carlo-based or probabilistic techniques to assess model robustness.

Illustrative example

Consider the case where N = 6 and k = 2. The number of possible test set combinations is ${\binom {6}{2}}=15$ . Each of the six groups appears in five test splits. Consequently, five distinct backtest paths can be constructed, each incorporating one appearance from every group.

Test group assignment matrix

This table shows the 15 test combinations. An "x" indicates that the corresponding group is included in the test set for that split.

Paths generated for *N = 6*, *k = 2*
Group	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15
G1	x	x	x	x	x
G2	x					x	x	x	x
G3		x				x				x	x	x
G4			x				x			x			x	x
G5				x				x			x		x		x
G6					x				x			x		x	x

Backtest path assignment

Each group contributes to five different backtest paths. The number in each cell indicates the path to which the group's result is assigned for that split.

Path assignments for each group
Group	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15
G1	1	2	3	4	5
G2	1					2	3	4	5
G3		1				2				3	4	5
G4			1				2			3			4	5
G5				1				2			3		4		5
G6					1				2			3		4	5

Advantages

Combinatorial Purged Cross-Validation offers several key benefits over conventional methods:

It produces a distribution of performance metrics, enabling more rigorous statistical inference.
The method systematically eliminates lookahead bias through purging and embargoing.
By simulating multiple historical scenarios, it reduces the dependence on any single market regime or realization.
It supports high-confidence comparisons between competing models or strategies.

CPCV is commonly used in quantitative strategy research, especially for evaluating predictive models such as classifiers, regressors, and portfolio optimizers.^[3] It has been applied to estimate realistic Sharpe ratios, assess the risk of overfitting, and support the use of statistical tools such as the Deflated Sharpe Ratio (DSR).^[8]^[4]

Limitations

The main limitation of CPCV stems from its high computational cost. However, this cost can be managed by sampling a finite number of splits from the space of all possible combinations.

References

^ ^a ^b ^c Joubert, J. & Sestovic, D. & Barziy I. & Distaso, W. & Lopez de Prado, M. (2024): "Enhanced Backtesting for Practitioners." The Journal of Portfolio Management, Quantitative Tools 51(2), pp. 12 - 27. DOI: 10.3905/jpm.2024.1.637
^ ^a ^b Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014): "The Probability of Backtest Overfitting." Journal of Computational Finance. 20(4), 39-69.
^ ^a ^b ^c Lopez de Prado, M. (2018): "The 10 Reasons Most Machine Learning Funds Fail." The Journal of Portfolio Management, 44(6), pp. 120 - 133. DOI: 10.3905/jpm.2018.44.6.120
^ ^a ^b ^c Lopez de Prado, M. (2020): Machine Learning for Asset Managers. Cambridge University Press. https://www.amazon.com/Machine-Learning-Managers-Elements-Quantitative/dp/1108792898
^ "KFold CV Illustration by Scikit-Learn". Scikit-Learn. 20 May 2025.
^ ^a ^b ^c ^d ^e López de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons. ISBN 978-1-119-48208-6.
^ Arnott, R. D., Beck, N., Kalesnik, V., & West, J. (2019). "Alice’s Adventures in Factorland: Three Blunders That Plague Factor Investing." The Journal of Portfolio Management, 45(4), 129–144.
^ ^a ^b López de Prado, M. & Zoonekynd, V. (2025):"Correcting the Factor Mirage: A Research Protocol for Causal Factor Investing." Available at SSRN: https://ssrn.com/abstract=4697929 or http://dx.doi.org/10.2139/ssrn.4697929

[JPM-1] Joubert, J. & Sestovic, D. & Barziy I. & Distaso, W. & Lopez de Prado, M. (2024): "Enhanced Backtesting for Practitioners." The Journal of Portfolio Management, Quantitative Tools 51(2), pp. 12 - 27. DOI: 10.3905/jpm.2024.1.637

[JCF-2] Bailey, D. H., Borwein, J. M., López de Prado, M., & Zhu, Q. J. (2014): "The Probability of Backtest Overfitting." Journal of Computational Finance. 20(4), 39-69.

[The10-3] Lopez de Prado, M. (2018): "The 10 Reasons Most Machine Learning Funds Fail." The Journal of Portfolio Management, 44(6), pp. 120 - 133. DOI: 10.3905/jpm.2018.44.6.120

[Cambridge-4] Lopez de Prado, M. (2020): Machine Learning for Asset Managers. Cambridge University Press. https://www.amazon.com/Machine-Learning-Managers-Elements-Quantitative/dp/1108792898

[5] "KFold CV Illustration by Scikit-Learn". Scikit-Learn. 20 May 2025.

[AFML-6] López de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons. ISBN 978-1-119-48208-6.

[Arnott-7] Arnott, R. D., Beck, N., Kalesnik, V., & West, J. (2019). "Alice’s Adventures in Factorland: Three Blunders That Plague Factor Investing." The Journal of Portfolio Management, 45(4), 129–144.

[Factor-8] López de Prado, M. & Zoonekynd, V. (2025):"Correcting the Factor Mirage: A Research Protocol for Causal Factor Investing." Available at SSRN: https://ssrn.com/abstract=4697929 or http://dx.doi.org/10.2139/ssrn.4697929

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]