Forecasters¶
This module implements the simple forecasting models used by Cvxportfolio.
These are standard ones like historical mean, variance, and covariance. In
most cases the models implemented here are equivalent to the relevant Pandas
DataFrame methods, including (most importantly) the logic used to skip over any
np.nan
. There are some subtle differences explained below.
Our forecasters are optimized to be evaluated sequentially in time: at each point in time in a back-test the forecast computed at the previous time step is updated with the most recent observation. This is in some cases (e.g., covariances) much more efficient than computing from scratch.
Most of our forecasters implement both a rolling window and exponential moving
average logic. These are specified by the rolling
and half_life
parameters respectively, which are either Pandas Timedeltas or np.inf
.
The latter is the default, and means that the whole past is used, with no
exponential smoothing. Note that it’s possible to use both, e.g.,
estimate covariance matrices ignoring past returns older than 5 years and
smoothing the recent ones using an exponential kernel with half-life of 1 year.
Finally, we note that the covariance, variance and standard deviation
forecasters implement the kelly
parameter, which is True by default.
This is a simple trick explained in
section 4.2 (page 28) of the paper, simplifies the
computation and provides in general (slightly) higher performance.
For example, using the notation of the paper, the classical definition of
covariance is
this is what you get by setting kelly=False
. The default, kelly=True
,
gives instead
so that the resulting Markowitz-style optimization problem corresponds to the second order Taylor approximation of a (risk-constrained) Kelly objective, as is explained briefly at page 28 of the paper, or with more detail (and hard-to-read math) in section 6 of the Risk-Constrained Kelly Gambling paper.
Lastly, some forecasters implement a basic caching mechanism.
This is used in two ways. First, online (e.g., in back-test): if multiple
copies of the same forecaster need access to the estimated value, as is the
case in cvxportfolio.MultiPeriodOptimization
policies, the expensive
evaluation is only done once. Then, offline, provided that the
cvxportfolio.data.MarketData
server used implements the
cvxportfolio.data.MarketData.partial_universe_signature()
method
(so that we can certify which market data the cached values are computed on).
This type of caching simply saves on disk the forecasted values, and makes it
available automatically next time the user runs a back-test on the same market
data (and same universe). This is especially useful when doing hyper-parameter
optimization, so that expensive computations like evaluating large covariance
matrices are only done once.
How to use them¶
These forecasters are each the default option of some Cvxportfolio optimization
term, for example HistoricalMeanReturn
is the default used by
cvxportfolio.ReturnsForecast
. In this way each is used with its
default options. If you want to change the options you can simply pass
the relevant forecaster class, instantiated with the options of your choice,
to the Cvxportfolio object. For example
import cvxportfolio as cvx
from cvxportfolio.forecast import HistoricalMeanReturn
import pandas as pd
returns_forecast = cvx.ReturnsForecast(
r_hat = HistoricalMeanReturn(
half_life=pd.Timedelta(days=365),
rolling=pd.Timedelta(days=365*5)))
if you want to apply exponential smoothing to the mean returns forecaster with half-life of 1 year, and skip over all observations older than 5 years. Both are relative to each point in time at which the policy is evaluated.
Forecasters Documentation¶
- class cvxportfolio.forecast.HistoricalMeanReturn(half_life=inf, rolling=inf)View on GitHub¶
Historical means of non-cash returns.
Added in version 1.2.0: Added the
half_life
androlling
parameters.When both
half_life
androlling
are infinity, this is equivalent topast_returns.iloc[:,:-1].mean()
where
past_returns
is a time-indexed dataframe containing the past returns (if in back-test that’s relative to each point in time, ), and its last column, which we skip over, are the cash returns. We use the same logic as Pandas to handlenp.nan
values.- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalMeanVolume(half_life=inf, rolling=inf)View on GitHub¶
Historical means of traded volume in units of value (e.g., dollars).
Added in version 1.2.0.
- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalVariance(rolling=inf, half_life=inf, kelly=True)View on GitHub¶
Historical variances of non-cash returns.
Added in version 1.2.0: Added the
half_life
androlling
parameters.When both
half_life
androlling
are infinity, this is equivalent topast_returns.iloc[:,:-1].var(ddof=0)
if you set
kelly=False
and(past_returns**2).iloc[:,:-1].mean()
otherwise (we use the same logic to handle
np.nan
values).- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.kelly (bool) – if
True
compute \(\mathbf{E}[r^2]\), else \(\mathbf{E}[r^2] - {\mathbf{E}[r]}^2\). The second corresponds to the classic definition of variance, while the first is what is obtained by Taylor approximation of the Kelly gambling objective. See discussion above.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalStandardDeviation(rolling=inf, half_life=inf, kelly=True)View on GitHub¶
Historical standard deviation of non-cash returns.
Added in version 1.2.0: Added the
half_life
androlling
parameters.When both
half_life
androlling
are infinity, this is equivalent topast_returns.iloc[:,:-1].std(ddof=0)
if you set
kelly=False
andnp.sqrt((past_returns**2).iloc[:,:-1].mean())
otherwise (we use the same logic to handle
np.nan
values).- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.kelly (bool) – Same as in
cvxportfolio.forecast.HistoricalVariance
. Default True.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalMeanError(rolling=inf, half_life=inf, kelly=False)View on GitHub¶
Historical standard deviations of the mean of non-cash returns.
Added in version 1.2.0: Added the
half_life
androlling
parameters.For a given time series of past returns \(r_{t-1}, r_{t-2}, \ldots, r_0\) this is \(\sqrt{\text{Var}[r]/t}\). When there are missing values we ignore them, both to compute the variance and the count.
- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.kelly (bool) – Same as in
cvxportfolio.forecast.HistoricalVariance
. Default False.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalFactorizedCovariance(rolling=inf, half_life=inf, kelly=True)View on GitHub¶
Historical covariance matrix of non-cash returns, factorized.
Added in version 1.2.0: Added the
half_life
androlling
parameters.When both
half_life
androlling
are infinity, this is equivalent to, before factorizationpast_returns.iloc[:,:-1].cov(ddof=0)
if you set
kelly=False
. We use the same logic to handlenp.nan
values. Forkelly=True
it is not possible to reproduce with one single Pandas method (but we do test against Pandas in the unit tests).- Parameters:
half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. Default
np.inf
, meaning no exponential smoothing.rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default
np.inf
, meaning that all past is used.kelly (bool) – if
True
each element of the covariance matrix \(\Sigma_{i,j}\) is equal to \(\mathbf{E} r^{i} r^{j}\), otherwise it is \(\mathbf{E} r^{i} r^{j} - \mathbf{E} r^{i} \mathbf{E} r^{j}\). The second case corresponds to the classic definition of covariance, while the first is what is obtained by Taylor approximation of the Kelly gambling objective. (See discussion above.) In the second case, the estimated covariance is the same as what is returned bypandas.DataFrame.cov(ddof=0)
, i.e., we use the same logic to handle missing data.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
- class cvxportfolio.forecast.HistoricalLowRankCovarianceSVD(num_factors, svd_iters=10, svd='numpy')View on GitHub¶
Build factor model covariance using truncated SVD.
Note
This forecaster is experimental and not covered by semantic versioning, we may change it without warning.
- Parameters:
num_factors (int) – How many factors in the low rank model.
svd_iters (int) – How many iteration of truncated SVD to apply. If you get a badly conditioned covariance you may to lower this.
svd (str) – Which SVD routine to use, currently only dense (LAPACK) via Numpy.
- estimate(market_data, t=None)View on GitHub¶
Estimate the forecaster at given time on given market data.
This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.
- Parameters:
market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by
market_data.trading_calendar()
. Default isNone
, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to setonline_usage=True
if forecasting on the last timestamp (usually, today).
Note
This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.
- Raises:
ValueError – If the provided time t is not in the trading calendar.
- Returns:
Forecasted value and time at which the forecast is made (for safety checking).
- Return type:
(np.array, pd.Timestamp)
Base forecaster classes¶
Work in progress.
- class cvxportfolio.forecast.BaseForecastView on GitHub¶
Base class for forecasters.