Data Interfaces¶
This module include classes that download, store, and serve market data.
The two main abstractions are SymbolData
and MarketData
.
Neither are exposed outside this module. Their derived classes instead are.
If you want to interface cvxportfolio with financial data source other
than the ones we provide, you should derive from either of those two classes.
Single-symbol data download and storage¶
- class cvxportfolio.YahooFinance(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))View on GitHub¶
Yahoo Finance symbol data.
Added in version 1.2.0: The data cleaning logic has been significantly improved, see the
data_cleaning.py
example to view what’s done on any given name (or enable'INFO'
logging messages). It is recommended to delete the~/cvxportfolio_data
folder with data files downloaded by previous Cvxportfolio versions.- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
.base_storage_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this.
grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data.
- Attribute data:
The downloaded, and cleaned, data for the symbol.
- class cvxportfolio.Fred(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))View on GitHub¶
Fred single-symbol data.
- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.base_storage_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this. By default it’s a directory named
cvxportfolio_data
in your home folder.grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data. By default it’s one day.
- Attribute data:
The downloaded data for the symbol.
Market data servers¶
- class cvxportfolio.UserProvidedMarketData(returns, volumes=None, prices=None, copy_dataframes=True, trading_frequency=None, min_history=Timedelta('365 days 05:45:36'), base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'), cash_key='USDOLLAR', online_usage=False, universe_selection_in_time=None)View on GitHub¶
User-provided market data.
Added in version 1.3.0: The new parameter
universe_selection_in_time
used to optionally exclude assets from the trading universe at different points in time.- Parameters:
returns (pandas.DataFrame) – Historical open-to-open returns. The return at time \(t\) is \(r_t = p_{t+1}/p_t -1\) where \(p_t\) is the (open) price at time \(t\). Must have datetime index. You can also include cash returns as its last column, and set
cash_key
below to the last column’s name.volumes (pandas.DataFrame or None) – Historical market volumes, expressed in units of value (e.g., US dollars).
prices (pandas.DataFrame or None) – Historical open prices (e.g., used for rounding trades in the
MarketSimulator
).trading_frequency (str or None) – Instead of using frequency implied by the index of the returns, down-sample all dataframes. We implement
'weekly'
,'monthly'
,'quarterly'
and'annual'
. By default (None) don’t down-sample.min_history (pandas.Timedelta) – Minimum amount of time for which the returns are not
np.nan
before each assets enters in a back-test.base_location (pathlib.Path) – The location of the storage, only used in case it downloads the cash returns. By default it’s a directory named
cvxportfolio_data
in your home folder.cash_key (str) – Name of the cash account. If not the last column of the provided returns, it will be added. In that case you should make sure your provided dataframes have a timezone aware datetime index. Its returns are the risk-free rate. If not in the original dataframe, choice of
USDOLLAR
,EURO
,JPYEN
,GBPOUND
(Cvxportfolio downloads the historical central bank rates from FRED), orcash
, which sets the cash returns to 0. DefaultUSDOLLAR
.online_usage (bool) – Disable removal of assets that have
np.nan
returns for the given time. Default False.universe_selection_in_time (pd.DataFrame or None) – Boolean dataframe used to specify which assets are to be included in the trading universe at each point in time. The columns are the full universe (same columns as
prices
,volumes
, orreturns
without the last column, cash). The index is datetime and, differently from the usual convention, needs not to to be the same as the index of the other dataframes: at each point in time of a back-test the last valid observation (before, or at, the trading time) is selected. (You still need to provide a timezoned index if the returns’ index is, otherwise time comparisons can’t be done.) The entries are boolean;True
means that the corresponding asset can be invested in at the time,False
that it can’t. Note that this is more fundamental than imposing time-varying position limit, for example, in an optimization-based policy. Non-investable assets are removed by theMarketSimulator
; their positions converted to cash, theresult.BacktestResult
dataframes will havenp.nan
on those dates for those assets’, and the policy is re-compiled without those assets. You shouldn’t use it to make frequent changes to the trading universe (if that’s your usecase), but rather impose time-varying position limits viaconstraints.MinWeights
andconstraints.MaxWeights
. Also, note that the filtering implied bymin_history
is still applied (in addition to this), as well as non-nan
returns for the period (which can be disabled byonline_usage
). Default, None, don’t use this filtering.
- serve(t)View on GitHub¶
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Time of execution, e.g., stock market open of a given day.
- Returns:
(past_returns, current_returns, past_volumes, current_volumes, current_prices)
- Return type:
(pandas.DataFrame, pandas.Series, pandas.DataFrame or None, pandas.Series or None, pandas.Series or None)
- trading_calendar(start_time=None, end_time=None, include_end=True)View on GitHub¶
Get trading calendar from market data.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pandas.DatetimeIndex
- class cvxportfolio.DownloadedMarketData(universe=(), datasource='YahooFinance', cash_key='USDOLLAR', base_location=PosixPath('/home/docs/cvxportfolio_data'), storage_backend='pickle', min_history=Timedelta('365 days 05:45:36'), grace_period=Timedelta('1 days 00:00:00'), trading_frequency=None, online_usage=False, universe_selection_in_time=None)View on GitHub¶
Market data that is downloaded.
Added in version 1.3.0: The new parameter
universe_selection_in_time
used to optionally exclude assets from the trading universe at different points in time.- Parameters:
universe (list) – List of names as understood by the data source used, e.g.,
['AAPL', 'GOOG']
if using the default Yahoo Finance data source.datasource (str or
SymbolData
class) – The data source used.cash_key (str) – Name of the cash account, its rates will be added as last column of the returns, as the risk-free rate. Choice of
USDOLLAR
,EURO
,``JPYEN``,GBPOUND
(Cvxportfolio downloads the historical central bank rates from FRED), orcash
, which sets the cash returns to 0. DefaultUSDOLLAR
.base_location (pathlib.Path) – The location of the storage. By default it’s a directory named
cvxportfolio_data
in your home folder.storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.min_history (pandas.Timedelta) – Minimum amount of time for which the returns are not
np.nan
before each assets enters in a back-test.grace_period (pandas.Timedelta) – If the most recent observation of each symbol’s data is less old than this we do not download new data. By default it’s one day.
trading_frequency (str or None) – Instead of using frequency implied by the index of the returns, down-sample all dataframes. We implement
'weekly'
,'monthly'
,'quarterly'
and'annual'
. By default (None) don’t down-sample.online_usage (bool) – Disable removal of assets that have
np.nan
returns for the given time. Default False.universe_selection_in_time (pd.DataFrame or None) – Boolean dataframe used to specify which assets are to be included in the trading universe at each point in time. The columns are the full
universe
. The index is datetime and, differently from the usual convention, needs not to to be the same as the index of the other dataframes: at each point in time of a back-test the last valid observation (before, or at, the trading time) is selected. (You still need to provide a timezoned index if the returns’ index is, otherwise time comparisons can’t be done.) The entries are boolean;True
means that the corresponding asset can be invested in at the time,False
that it can’t. Note that this is more fundamental than imposing time-varying position limit, for example, in an optimization-based policy. Non-investable assets are removed by theMarketSimulator
; their positions converted to cash, theresult.BacktestResult
dataframes will havenp.nan
on those dates for those assets’, and the policy is re-compiled without those assets. You shouldn’t use it to make frequent changes to the trading universe (if that’s your usecase), but rather impose time-varying position limits viaconstraints.MinWeights
andconstraints.MaxWeights
. Also, note that the filtering implied bymin_history
is still applied (in addition to this), as well as non-nan
returns for the period (which can be disabled byonline_usage
). Default, None, don’t use this filtering.
- serve(t)View on GitHub¶
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Time of execution, e.g., stock market open of a given day.
- Returns:
(past_returns, current_returns, past_volumes, current_volumes, current_prices)
- Return type:
(pandas.DataFrame, pandas.Series, pandas.DataFrame or None, pandas.Series or None, pandas.Series or None)
- trading_calendar(start_time=None, end_time=None, include_end=True)View on GitHub¶
Get trading calendar from market data.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pandas.DatetimeIndex
Base classes (for using other data sources)¶
- class cvxportfolio.data.SymbolData(symbol, storage_backend='pickle', base_location=PosixPath('/home/docs/cvxportfolio_data'), grace_period=Timedelta('1 days 00:00:00'))View on GitHub¶
Base class for a single symbol time series data.
The data is either in the form of a Pandas Series or DataFrame and has datetime index.
This class needs to be derived. At a minimum, one should redefine the
_download
method, which implements the downloading of the symbol’s time series from an external source. The method takes the current (already downloaded and stored) data and is supposed to only append to it. In this way we only store new data and don’t modify already downloaded data.Additionally one can redefine the
_preload
method, which prepares data to serve to the user (so the data is stored in a different format than what the user sees.) We found that this separation can be useful.This class interacts with module-level functions named
_loader_BACKEND
and_storer_BACKEND
, whereBACKEND
is the name of the storage system used. We definepickle
,csv
, andsqlite
backends. These may have limitations. See their docstrings for more information.- Parameters:
symbol (str) – The symbol that we downloaded.
storage_backend (str) – The storage backend, implemented ones are
'pickle'
,'csv'
, and'sqlite'
. By default'pickle'
.base_location (pathlib.Path) – The location of the storage. We store in a subdirectory named after the class which derives from this. By default it’s a directory named
cvxportfolio_data
in your home folder.grace_period (pandas.Timedelta) – If the most recent observation in the data is less old than this we do not download new data. By default it’s one day.
- Attribute data:
The downloaded data for the symbol.
- class cvxportfolio.data.MarketDataView on GitHub¶
Prepare, hold, and serve market data.
- Method serve:
Serve data for policy and simulator at time \(t\).
- serve(t)View on GitHub¶
Serve data for policy and simulator at time \(t\).
- Parameters:
t (pandas.Timestamp) – Trading time. It must be included in the timestamps returned by
trading_calendar()
.- Returns:
past_returns, current_returns, past_volumes, current_volumes, current_prices
- Return type:
(pd.DataFrame, pd.Series, pd.DataFrame, pd.Series, pd.Series)
- universe_at_time(t)View on GitHub¶
Return trading universe at given time.
Base implementation simply returns the index of
current_returns
returned byserve()
. Typically a more efficient implementation can be made available.- Parameters:
t (pandas.Timestamp) – Trading time. It must be included in the timestamps returned by
trading_calendar()
.- Returns:
Trading universe at time
t
.- Return type:
pd.Index
- trading_calendar(start_time=None, end_time=None, include_end=True)View on GitHub¶
Get trading calendar between times.
- Parameters:
start_time (pandas.Timestamp) – Initial time of the trading calendar. Always inclusive if present. If None, use the first available time.
end_time (pandas.Timestamp) – Final time of the trading calendar. If None, use the last available time.
include_end (bool) – Include end time.
- Returns:
Trading calendar.
- Return type:
pd.DatetimeIndex
- property periods_per_yearView on GitHub¶
Average trading periods per year.
- Return type:
int
- property full_universeView on GitHub¶
Full universe, which might not be available for trading.
- Returns:
Full universe.
- Return type:
pd.Index
- partial_universe_signature(partial_universe)View on GitHub¶
Unique signature of this instance with a partial universe.
A partial universe is a subset of the full universe that is available at some time for trading.
This is used in cvxportfolio.cache to sign back-test caches that are saved on disk. If not redefined it returns None which disables on-disk caching.
- Parameters:
partial_universe (pandas.Index) – A subset of the full universe.
- Returns:
Signature.
- Return type:
str