Forecasters¶

This module implements the simple forecasting models used by Cvxportfolio.

These are standard ones like historical mean, variance, and covariance. In most cases the models implemented here are equivalent to the relevant Pandas DataFrame methods, including (most importantly) the logic used to skip over any np.nan. There are some subtle differences explained below.

Our forecasters are optimized to be evaluated sequentially in time: at each point in time in a back-test the forecast computed at the previous time step is updated with the most recent observation. This is in some cases (e.g., covariances) much more efficient than computing from scratch.

Most of our forecasters implement both a rolling window and exponential moving average logic. These are specified by the rolling and half_life parameters respectively, which are either Pandas Timedeltas or np.inf. The latter is the default, and means that the whole past is used, with no exponential smoothing. Note that it’s possible to use both, e.g., estimate covariance matrices ignoring past returns older than 5 years and smoothing the recent ones using an exponential kernel with half-life of 1 year.

Finally, we note that the covariance, variance and standard deviation forecasters implement the kelly parameter, which is True by default. This is a simple trick explained in section 4.2 (page 28) of the paper, simplifies the computation and provides in general (slightly) higher performance. For example, using the notation of the paper, the classical definition of covariance is

\[\Sigma = \mathbf{E}(r_t - \mu)(r_t - \mu)^T,\]

this is what you get by setting kelly=False. The default, kelly=True, gives instead

\[\Sigma^\text{kelly} = \mathbf{E}r_t r_t^T = \Sigma + \mu \mu^T,\]

so that the resulting Markowitz-style optimization problem corresponds to the second order Taylor approximation of a (risk-constrained) Kelly objective, as is explained briefly at page 28 of the paper, or with more detail (and hard-to-read math) in section 6 of the Risk-Constrained Kelly Gambling paper.

Lastly, some forecasters implement a basic caching mechanism. This is used in two ways. First, online (e.g., in back-test): if multiple copies of the same forecaster need access to the estimated value, as is the case in cvxportfolio.MultiPeriodOptimization policies, the expensive evaluation is only done once. Then, offline, provided that the cvxportfolio.data.MarketData server used implements the cvxportfolio.data.MarketData.partial_universe_signature() method (so that we can certify which market data the cached values are computed on). This type of caching simply saves on disk the forecasted values, and makes it available automatically next time the user runs a back-test on the same market data (and same universe). This is especially useful when doing hyper-parameter optimization, so that expensive computations like evaluating large covariance matrices are only done once.

How to use them¶

These forecasters are each the default option of some Cvxportfolio optimization term, for example HistoricalMeanReturn is the default used by cvxportfolio.ReturnsForecast. In this way each is used with its default options. If you want to change the options you can simply pass the relevant forecaster class, instantiated with the options of your choice, to the Cvxportfolio object. For example

import cvxportfolio as cvx
from cvxportfolio.forecast import HistoricalMeanReturn
import pandas as pd

returns_forecast = cvx.ReturnsForecast(
    r_hat = HistoricalMeanReturn(
        half_life=pd.Timedelta(days=365),
        rolling=pd.Timedelta(days=365*5)))

if you want to apply exponential smoothing to the mean returns forecaster with half-life of 1 year, and skip over all observations older than 5 years. Both are relative to each point in time at which the policy is evaluated.

Forecasters Documentation¶

class cvxportfolio.forecast.HistoricalMeanReturn(half_life=inf, rolling=inf)View on GitHub ¶

Historical means of non-cash returns.

Added in version 1.2.0: Added the half_life and rolling parameters.

When both half_life and rolling are infinity, this is equivalent to

past_returns.iloc[:,:-1].mean()

where past_returns is a time-indexed dataframe containing the past returns (if in back-test that’s relative to each point in time, ), and its last column, which we skip over, are the cash returns. We use the same logic as Pandas to handle np.nan values.

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalMeanVolume(half_life=inf, rolling=inf)View on GitHub ¶

Historical means of traded volume in units of value (e.g., dollars).

Added in version 1.2.0.

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalVariance(rolling=inf, half_life=inf, kelly=True)View on GitHub ¶

Historical variances of non-cash returns.

Added in version 1.2.0: Added the half_life and rolling parameters.

When both half_life and rolling are infinity, this is equivalent to

past_returns.iloc[:,:-1].var(ddof=0)

if you set kelly=False and

(past_returns**2).iloc[:,:-1].mean()

otherwise (we use the same logic to handle np.nan values).

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.
kelly (bool) – if True compute \(\mathbf{E}[r^2]\), else \(\mathbf{E}[r^2] - {\mathbf{E}[r]}^2\). The second corresponds to the classic definition of variance, while the first is what is obtained by Taylor approximation of the Kelly gambling objective. See discussion above.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalStandardDeviation(rolling=inf, half_life=inf, kelly=True)View on GitHub ¶

Historical standard deviation of non-cash returns.

Added in version 1.2.0: Added the half_life and rolling parameters.

When both half_life and rolling are infinity, this is equivalent to

past_returns.iloc[:,:-1].std(ddof=0)

if you set kelly=False and

np.sqrt((past_returns**2).iloc[:,:-1].mean())

otherwise (we use the same logic to handle np.nan values).

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.
kelly (bool) – Same as in cvxportfolio.forecast.HistoricalVariance. Default True.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalMeanError(rolling=inf, half_life=inf, kelly=False)View on GitHub ¶

Historical standard deviations of the mean of non-cash returns.

Added in version 1.2.0: Added the half_life and rolling parameters.

For a given time series of past returns \(r_{t-1}, r_{t-2}, \ldots, r_0\) this is \(\sqrt{\text{Var}[r]/t}\). When there are missing values we ignore them, both to compute the variance and the count.

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. If in back-test, that is with respect to each point in time. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.
kelly (bool) – Same as in cvxportfolio.forecast.HistoricalVariance. Default False.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalFactorizedCovariance(rolling=inf, half_life=inf, kelly=True)View on GitHub ¶

Historical covariance matrix of non-cash returns, factorized.

Added in version 1.2.0: Added the half_life and rolling parameters.

When both half_life and rolling are infinity, this is equivalent to, before factorization

past_returns.iloc[:,:-1].cov(ddof=0)

if you set kelly=False. We use the same logic to handle np.nan values. For kelly=True it is not possible to reproduce with one single Pandas method (but we do test against Pandas in the unit tests).

Parameters:

half_life (pandas.Timedelta or np.inf) – Half-life of exponential smoothing, expressed as Pandas Timedelta. Default np.inf, meaning no exponential smoothing.
rolling (pandas.Timedelta or np.inf) – Rolling window used: observations older than this Pandas Timedelta are skipped over. If in back-test, that is with respect to each point in time. Default np.inf, meaning that all past is used.
kelly (bool) – if True each element of the covariance matrix \(\Sigma_{i,j}\) is equal to \(\mathbf{E} r^{i} r^{j}\), otherwise it is \(\mathbf{E} r^{i} r^{j} - \mathbf{E} r^{i} \mathbf{E} r^{j}\). The second case corresponds to the classic definition of covariance, while the first is what is obtained by Taylor approximation of the Kelly gambling objective. (See discussion above.) In the second case, the estimated covariance is the same as what is returned by pandas.DataFrame.cov(ddof=0), i.e., we use the same logic to handle missing data.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

class cvxportfolio.forecast.HistoricalLowRankCovarianceSVD(num_factors, svd_iters=10, svd='numpy')View on GitHub ¶

Build factor model covariance using truncated SVD.

Note

This forecaster is experimental and not covered by semantic versioning, we may change it without warning.

Parameters:

num_factors (int) – How many factors in the low rank model.
svd_iters (int) – How many iteration of truncated SVD to apply. If you get a badly conditioned covariance you may to lower this.
svd (str) – Which SVD routine to use, currently only dense (LAPACK) via Numpy.

estimate(market_data, t=None)View on GitHub ¶

Estimate the forecaster at given time on given market data.

This uses the same logic used by a trading policy to evaluate the forecaster at a given point in time.

Parameters:

market_data (cvx.MarketData instance) – Market data server, used to provide data to the forecaster.
t (pd.Timestamp or None) – Time at which to estimate the forecaster. Must be among the ones returned by market_data.trading_calendar(). Default is None, meaning that the last valid timestamp is chosen. Note that with default market data servers you need to set online_usage=True if forecasting on the last timestamp (usually, today).

Note

This method is not finalized! It is still experimental, and not covered by semantic versioning guarantees.

Raises:: ValueError – If the provided time t is not in the trading calendar.
Returns:: Forecasted value and time at which the forecast is made (for safety checking).
Return type:: (np.array, pd.Timestamp)

Base forecaster classes¶

Work in progress.

class cvxportfolio.forecast.BaseForecastView on GitHub ¶: Base class for forecasters.