There is no single source of truth when generating corporate bond factors from the underlying data. Handling potential data uncertainty is crucial for generating reliable and reproducible asset pricing research in corporate bonds.
We propose a new methodology which implements ex-ante data filtering with well-known corporate bond filters which precludes the risk of look-ahead bias and ensures the resultant bond factors/strategies are tradable in real-time.
Introducing PyBondLab, a new Python package designed for portfolio sorting with functionality to generate hundreds of factors/anomalies/strategies (all from the same sorting variable) across any bond dataset. You can find detailed examples, walk-throughs and other details on the PyBondLab GitHub repository. PyBondLab correctly handles data uncertainty for you, is usable with any dataset and any variable you wish to form a factor/strategy upon. This package has primarily been developed and is maintained by Giulio Rossetti at Warwick Business School.
Ex-Ante vs. Ex-Post Filtering
Ex-post data filtering induces look-ahead bias to any “out-of-sample” asset pricing result. Ex-post filtering implies winsorizing/excluding/trimming data (primarily future returns) beyond what is observable to an econometrician at portfolio formation month t. This is illustrated in the Figure below:

This implies that the representative trader will never realize a future return that is beyond the chosen ex-post return winsorization or exclusion threshold. Prices and returns beyond the portfolio formation date t are winsorized/trimmed or excluded altogether from the data sample.
Ex-ante filtering explicitly filters the data (investment universe) only with information that is available up until the portfolio formation month t. It avoids look-ahead bias, catches potential data errors and is implementable in real-time. PyBondLab includes functionality to implement this methodology.
The difference between these two filtering methods is not “semantic”. The usage of ex-post filters fundamentally destroys the concept of a feasible, implementable “out-of-sample” asset pricing test.
An Example with Non-Investment Grade (NIG) Momentum
The Figure below presents the “High-Low” average return (premium) from sorting bonds into deciles (10) portfolios based on their prior 6-month momentum across filters applied to the right-tail of the return distribution. The premium is defined as the average “out-of-sample” return difference between bonds in the “High” vs. “Low” momentum portfolio. The holding period is 6-months. The data is publicly available (with subscription) via the WRDS Bond Returns Module. The underlying strategy returns can all be generated with PyBondLab.

There is consistency across the equal-weighted (EW) NIG momentum premium when using ex-ante filters. The premia are always statistically insignificant to zero. With ex-post filtering, the premia are monotonically increasing in the aggressiveness of the asymmetric (right-tail) ex-post exclusion threshold. Additional results and Python code are available under the examples section of the dedicated PyBondLab GitHub repository.
If you make use of PyBondLab, please cite the companion paper as follows,
@article{Dickerson-Robotti-Rosseti-2024 ,
title={Common pitfalls in the evaluation of corporate bond strategies },
author = {Alexander Dickerson and Cesare Robotti and Giulio Rossetti},
journal={Working Paper},
volume={},
pages={},
year={2024},
publisher={}}