dalio.pipe package¶
Submodules¶
dalio.pipe.builders module¶
Builder Pipes
-
class
dalio.pipe.builders.
CovShrink
(frequency=252)¶ Bases:
dalio.pipe.pipe.PipeBuilder
Perform Covariance Shrinkage on data
Builder with a single piece: shirnkage. Shrinkage defines what kind of shrinkage to apply on a resultant covariance matrix. If none is set, covariance will not be shrunk.
-
frequency
¶ data time period frequency
- Type
int
-
build_model
(data, **kwargs)¶ Builds Covariance Srhinkage object and returns selected shrinkage strategy
- Returns
Function fitted on the data.
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
frequency
: int = None
-
transform
(data, **kwargs)¶ Build model using data get results.
- Returns
A covariance matrix
-
-
class
dalio.pipe.builders.
ExpectedReturns
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Get stock’s time series expected returns.
Builder with a single piece: return_model. return_model is what model to get the expected returns from.
-
build_model
(data, **kwargs)¶ Assemble pieces into a model given some data
The data will opten be optional, but several builder models will require it to be fitted on initialization. Which further shows why builders are necessary for context-agnostic graphs.
- Parameters
data – data that might be used to build the model.
**kwargs – any additional argument used in building
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
transform
(data, **kwargs)¶ Builds model using data and gets expected returns from it
-
-
class
dalio.pipe.builders.
ExpectedShortfall
(quantiles=None)¶ Bases:
dalio.pipe.builders.ValueAtRisk
Get expected shortfal for given quantiles
See base class for more in depth explanation.
-
transform
(data, **kwargs)¶ Get the value at risk given by an arch model and calculate the expected shortfall at given quantiles.
-
-
class
dalio.pipe.builders.
MakeARCH
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Build arch model and make it based on input data.
This class allows for the creation of arch models by configuring three pieces: the mean, volatility and distribution. These are set after initialization through the _Builder interface.
-
_piece
¶ see _Builder class.
- Type
list
-
assimilate
(model)¶ Assimilate core pieces of an existent ARCH Model.
Assimilation means setting this model’s’ pieces in accordance to an existing model’s pieces. Assimilation is shallow, so only the main pieces are assimilated, not their parameters.
- Parameters
model (ARCHModel) – Existing ARCH Model.
-
build_model
(data, **kwargs)¶ Build ARCH Model using data, set pieces and their arguments
- Returns
A built arch model from the arch package.
-
transform
(data, **kwargs)¶ Build model with sourced data
-
-
class
dalio.pipe.builders.
OptimumWeights
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Get optimum portfolio weights from an efficient frontier or CLA. This is also a builder with one piece: strategy. The strategy piece refers to the optimization strategy.
-
build_model
(data, **kwargs)¶ Assemble pieces into a model given some data
The data will opten be optional, but several builder models will require it to be fitted on initialization. Which further shows why builders are necessary for context-agnostic graphs.
- Parameters
data – data that might be used to build the model.
**kwargs – any additional argument used in building
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
transform
(data, **kwargs)¶ Get efficient frontier, fit it to model and get weights
-
-
class
dalio.pipe.builders.
PandasLinearModel
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Create a linear model from input pandas dataframe, using its index as the X value.
This builder is made up of a single piece: strategy. This piece sets which linear model should be used to fit the data.
-
build_model
(data, **kwargs)¶ Build model by returning the chosen model and initialization parameters
- Returns
Unfitted linear model
-
transform
(data, **kwargs)¶ Set up fitting parameters and fit built model.
- Returns
Fitted linear model
-
-
class
dalio.pipe.builders.
StockComps
(strategy='sic_code', max_ticks=6)¶ Bases:
dalio.pipe.pipe.Pipe
Get a list of a ticker’s comparable stocks
This can utilize any strategy of getting stock comparative companies and return up to a certain ammount of comps.
-
_strategy
¶ comparisson strategy name or function.
- Type
str, callable
-
max_ticks
¶ maximum number of tickers to return.
- Type
int
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
max_ticks
: int = None
-
run
(**kwargs)¶ Gets ticker argument and passes an empty ticker request to transform.
Empty ticker requests are supposed to return all tickers available in a source, so this allows the compariisson to be made in all stocks from a certain source.
- Raises
ValueError – if ticker is more than a single symbol.
-
transform
(data, **kwargs)¶ Get comps according to the set strategy
-
-
class
dalio.pipe.builders.
ValueAtRisk
(quantiles=None)¶ Bases:
dalio.pipe.pipe.Pipe
Get the value at risk for data based on an ARHC Model
This takes in an ARCH Model maker, not data, which might be unintuitive, yet necessary, as this allows users to modify the ARCH model generating these values separately. A useful strategy that allows for this is using a pipeline with an arch model as its first input and a ValueAtRisk instance as its second layer. This allows us to treat the PipeLine as a data input with VaR output and still have control over the ARCH Model pieces (given you left a local variable for it behind.)
-
_quantiles
¶ list of quantiles to check the value at risk for.
- Type
list
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Get values at risk at each quantile and each results maximum exedence from the mean.
The maximum exedence columns tells which quantile the loss is placed on. The word “maximum” might be misleading as it is compared to the minimum quantile, however, this definition is accurate as the column essentially answers the question: “what quantile furthest away from the mean does the data exeed?”
Thank you for the creators of the arch package for the beautiful visualizations and ideas!
- Raises
ValueError – if ARCH model does not have returns. This is often the case for unfitted models. Ensure your graph is complete.
TypeError – if ARCH model has unsuported distribution parameter.
-
dalio.pipe.col_generation module¶
Implement transformations that generates new colums from exising ones
-
class
dalio.pipe.col_generation.
Bin
(bin_map, *args, bin_strat='normal', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that adds a binned version of a column or columns.
If drop is set to True the new columns retain the names of the source columns; otherwise, the resulting column gain the suffix ‘_bin’
-
bin_map
¶ implicitly projects a left-most bin containing all elements smaller than the left-most end point and a right-most bin containing all elements larger that the right-most end point. For example, the list [0, 5, 8] is interpreted as the bins (-∞, 0), [0-5), [5-8) and [8, ∞).
- Type
array-like
-
bin_strat
¶ binning strategy to use. “normal” uses the default binning strategy per a list of value separations or number of bins. “quantile” uses a list of quantiles or a preset quantile range (4 for quartiles and 10 for deciles).
- Type
str, default “normal”
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[-3],[4],[5], [9]], [1,2,3, 4], ['speed']) >>> pdp.Bin({'speed': [5]}, drop=False).apply(df) speed speed_bin 1 -3 <5 2 4 <5 3 5 5≤ 4 9 5≤ >>> pdp.Bin({'speed': [0,5,8]}, drop=False).apply(df) speed speed_bin 1 -3 <0 2 4 0-5 3 5 5-8 4 9 8≤
-
-
class
dalio.pipe.col_generation.
BoxCox
(*args, columns=None, new_cols=None, non_neg=False, const_shift=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that applies the BoxCox transformation on data.
-
const_shift
¶ If given, each transformed column is first shifted by this constant. If non_neg is True then that transformation is applied first, and only then is the column shifted by this constant.
- Type
int, optional
-
-
class
dalio.pipe.col_generation.
Change
(*args, strategy='diff', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Perform item-by-item change
This has two main forms, percentage change and absolute change (difference).
-
_strategy
¶ change strategy.
- Type
str, callable
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.col_generation.
Custom
(func, *args, columns=None, new_cols=None, strategy='apply', axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Apply custom function.
-
strategy
¶ strategy for applying value function. One of [“apply”, “transform”, “agg”, “pipe”]
- Type
str, default “pipe”
Example
>>> import pandas as pd; from dalio.pipe import Custom; >>> data = [[3, 2143], [10, 1321], [7, 1255]] >>> df = pd.DataFrame(data, [1,2,3], ['years', 'avg_revenue']) >>> total_rev = lambda row: row['years'] * row['avg_revenue'] >>> add_total_rev = Custom(total_rev, 'total_revenue', axis=1) >>> add_total_rev.transform(df) years avg_revenue total_revenue 1 3 2143 6429 2 10 1321 13210 3 7 1255 8785 >>> def halfer(row): ... new = {'year/2': row['years']/2, ... 'rev/2': row['avg_revenue']/2} ... return pd.Series(new) >>> half_cols = Custom(halfer, axis=1, drop=False) >>> half_cols.transform(df) years avg_revenue rev/2 year/2 1 3 2143 1071.5 1.5 2 10 1321 660.5 5.0 3 7 1255 627.5 3.5
>>> data = [[3, 3], [2, 4], [1, 5]] >>> df = pd.DataFrame(data, [1,2,3], ["A","B"]) >>> func = lambda df: df['A'] == df['B'] >>> add_equal = Custom(func, "A==B", strategy="pipe", drop=False) >>> add_equal.transform(df) A B A==B 1 3 3 True 2 2 4 False 3 1 5 False
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.col_generation.
CustomByCols
(func, *args, strategy='apply', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage applying a function to individual columns iteratively.
-
func
¶ The function to be applied to each element of the given columns.
- Type
function
-
strategy
¶ Application strategy. Different from Custom class’ strategy parameter (which here is kept at “apply”) as this will now be done on a series (each column). Extra care should be taken to ensure resulting column lengths match.
- Type
str
Example
>>> import pandas as pd; import pdpipe as pdp; import math; >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]] >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"]) >>> round_ph = pdp.ApplyByCols("ph", math.ceil) >>> round_ph(df) ph lbl 1 4 acd 2 8 alk 3 13 alk
-
-
class
dalio.pipe.col_generation.
Index
(index_at, *args, columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.col_generation.
Log
(*args, columns=None, new_cols=None, non_neg=False, const_shift=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that log-transforms numeric data.
-
non_neg
¶ If True, each transformed column is first shifted by smallest negative value it includes (non-negative columns are thus not shifted).
- Type
bool, default False
-
const_shift
¶ If given, each transformed column is first shifted by this constant. If non_neg is True then that transformation is applied first, and only then is the column shifted by this constant.
- Type
int, optional
Example
>>> import pandas as pd; import pdpipe as pdp; >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]] >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"]) >>> log_stage = pdp.Log("ph", drop=True) >>> log_stage(df) ph lbl 1 1.163151 acd 2 1.974081 alk 3 2.493205 alk
-
-
class
dalio.pipe.col_generation.
MapColVals
(value_map, *args, columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that reintegrates the values of a column by a map.
-
value_map
¶ A dictionary mapping existing values to new ones. Values not in the dictionary as keys will be converted to NaN. If a function is given, it is applied element-wise to given columns. If a Series is given, values are mapped by its index to its values.
- Type
dict, function or pandas.Series
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1], [3], [2]], ['UK', 'USSR', 'US'], ['Medal']) >>> value_map = {1: 'Gold', 2: 'Silver', 3: 'Bronze'} >>> pdp.MapColVals('Medal', value_map).apply(df) Medal UK Gold USSR Bronze US Silver
-
-
class
dalio.pipe.col_generation.
Period
(period, *args, agg_func=<function mean>, columns=None, new_cols=None, axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Resample input time series data to a different period
- Attributes:
agg_func (callable): function to aggregate data to one period.
- # Quandl Input
Default set to np.mean.
- _period (str): period to resample data to. Can be either daily,
monthly, quarterly or yearly.
-
agg_func
: Callable[[Iterable], Any] = None¶
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
class
dalio.pipe.col_generation.
Rolling
(func, *args, columns=None, new_cols=None, rolling_window=2, axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Apply rolling function
-
rolling_window
¶ rolling window to apply function. If none, no rolling window is applied.
- Type
int, defailt None
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.col_generation.
StockReturns
(columns=None, new_cols=None, drop=True, reintegrate=False)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Perform percent change and minor aesthetic changes to data
dalio.pipe.forecast module¶
Transformations makes forecasts based on data
-
class
dalio.pipe.forecast.
Forecast
(horizon=10)¶ Bases:
dalio.pipe.pipe.Pipe
Generalized forecasting class.
This should be used mostly for subclassing or very generic forecasting interfaces.
-
horizon
¶ how many steps ahead to forecast
- Type
int
-
horizon
: int = None
-
transform
(data, **kwargs)¶ Return forecast of data
-
-
class
dalio.pipe.forecast.
GARCHForecast
(start=None, horizon=1)¶ Bases:
dalio.pipe.forecast.Forecast
Forecast data based on a fitted GARCH model
-
_start
¶ forecast start time and date.
- Type
pd.Timestamp
-
transform
(data, **kwargs)¶ Make a mean, variance and residual variance forecast.
Forecast will be made for the specified horizon starting at the specified time. This means that will only get data for the steps starting at the specified start date and the steps after it.
- Returns
A DataFrame with the columns MEAN, VARIANCE and RESIDUAL_VARIANCE for the time horizon after the start date.
-
dalio.pipe.pipe module¶
Defines the Pipe and PipeLine classes
Pipes are perhaps the most common classes in graphs and represent any transformation with one input and one output. Pipes` main functionality revolves around the .transform() method, which actually applies a transformation to data retrieved from a source. Pipes must also implement propper data checks by adding descriptions to their source.
-
class
dalio.pipe.pipe.
Pipe
¶ Bases:
dalio.base.transformer._Transformer
Pipes represend data modifications with one internal input and one internal output.
-
_source
¶ input data definition
- Type
_DataDef
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
get_input
()¶ Return the input transformer
-
pipeline
(*args)¶ Returns a PipeLine instance with self as the input source and any other Pipe instances as part of its pipeline.
- Parameters
*args – any additional Pipe to be added to the pipeline, in that order.
-
run
(**kwargs)¶ Get data from source, transform it, and return it
This will often be left alone unless there are specific keyword arguments or checks done in addition to the actual transformation. Keep in mind this is rare, as keyword arguments are often required by Translators, and checks are performed by DataDefs.
-
set_input
(new_input)¶ Set the input data source in place.
- Parameters
new_input (_Transformer) – new transformer to be set as input to source connection.
- Raises
TypeError – if new_input is not an instance of _Transformer.
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
with_input
(new_input)¶ Return copy of this transformer with the new data source.
-
-
class
dalio.pipe.pipe.
PipeBuilder
¶ Bases:
dalio.pipe.pipe.Pipe
,dalio.base.builder._Builder
Hybrid builder type for complementing _Transformer instances.
These specify extra methods implemented by _Transformer instances.
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
with_piece
(param, name, *args, **kwargs)¶ Copy self and return with a new piece set
-
-
class
dalio.pipe.pipe.
PipeLine
(*args)¶ Bases:
dalio.pipe.pipe.Pipe
Collection of Pipe transformations.
PipeLine instances represent multiple Pipe transformations being performed consecutively. Pipelines essentially execute multiple transformations one after the other, and thus do not check for data integrity in between them; so keep in mind that order matters and only the first data definition will be enforced.
-
pipeline
¶ list of Pipe instaces this pipeline is composed of
- Type
list
-
copy
(*args, **kwargs)¶ Make a copy of this Pipeline
-
extend
(*args, deep=False)¶ Extend existing pipeline with one or more Pipe instances
Keep in mind that this will not mean that
-
transform
(data, **kwargs)¶ Pass data sourced from first pipe through every Pipe`s .transform() method in order.
- Parameters
data – data sourced and checked from first source.
-
dalio.pipe.select module¶
Defines various ways of getting a subset of data based on some condition
-
class
dalio.pipe.select.
ColDrop
(columns)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that drops columns by name.
- Parameters
columns (str, iterable or callable) – The label, or an iterable of labels, of columns to drop. Alternatively, columns can be assigned a callable returning bool values for pandas.Series objects; if this is the case, every column for which it return True will be dropped.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,'a'],[5,'b']], [1,2], ['num', 'char']) >>> pdp.ColDrop('num').apply(df) char 1 a 2 b
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.select.
ColRename
(map_dict)¶ Bases:
dalio.pipe.pipe.Pipe
A pipeline stage that renames a column or columns.
-
rename_map
¶ Maps old column names to new ones.
- Type
dict
- Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,'a'],[5,'b']], [1,2], ['num', 'char']) >>> pdp.ColRename({'num': 'len', 'char': 'initial'}).apply(df) len initial 1 8 a 2 5 b
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.select.
ColReorder
(map_dict, level=0)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that reorders columns.
-
positions
¶ A mapping of column names to their desired positions after reordering Columns not included in the mapping will maintain their relative positions over the non-mapped colums.
- Type
dict
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,4,3,7]], columns=['a', 'b', 'c', 'd']) >>> pdp.ColReorder({'b': 0, 'c': 3}).apply(df) b a d c 0 4 8 7 3
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.select.
ColSelect
(columns)¶ Bases:
dalio.pipe.select._ColSelection
Select columns
-
transform
(data, **kwargs)¶ Selects the specified columns or returns data as is if no column was specified.
- Returns
Data of the same format as before but only only containing the specified columns.
-
-
class
dalio.pipe.select.
DateSelect
(start=None, end=None)¶ Bases:
dalio.pipe.pipe.Pipe
Select a date range.
This is commonly left as a local variable to control date range being used at a piece of a graph.
-
_start
¶ start date.
- Type
pd.Timestamp
-
_end
¶ end date.
- Type
pd.Timestamp
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
set_end
(end)¶ Set the _end attribute
-
set_start
(start)¶ Set the _start attribute
-
transform
(data, **kwargs)¶ Slices time series into selected date range.
- Returns
Time series of the same format as input containing a subset of the original dates.
-
-
class
dalio.pipe.select.
DropNa
(**kwargs)¶ Bases:
dalio.pipe.pipe.Pipe
A pipeline stage that drops null values.
Supports all parameter supported by pandas.dropna function.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,None],[1,11]], [1,2,3], ['a','b']) >>> pdp.DropNa().apply(df) a b 1 1 4.0 3 1 11.0
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.select.
FreqDrop
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that drops rows by value frequency.
- Parameters
threshold (int) – The minimum frequency required for a value to be kept.
column (str) – The name of the colums to check for the given value frequency.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b']) >>> pdp.FreqDrop(2, 'a').apply(df) a b 1 1 4 3 1 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.select.
RowDrop
(conditions, columns=None, reduce_strat=None)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that drop rows by callable conditions.
- Parameters
conditions (list-like or dict) – The list of conditions that make a row eligible to be dropped. Each condition must be a callable that take a cell value and return a bool value. If a list of callables is given, the conditions are checked for each column value of each row. If a dict mapping column labels to callables is given, then each condition is only checked for the column values of the designated column.
reduce ('any', 'all' or 'xor', default 'any') – Determines how row conditions are reduced. If set to ‘all’, a row must satisfy all given conditions to be dropped. If set to ‘any’, rows satisfying at least one of the conditions are dropped. If set to ‘xor’, rows satisfying exactly one of the conditions will be dropped. Set to ‘any’ by default.
columns (str or iterable, optional) – The label, or an iterable of labels, of columns. Optional. If given, input conditions will be applied to the sub-dataframe made up of these columns to determine which rows to drop.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[5,11]], [1,2,3], ['a','b']) >>> pdp.RowDrop([lambda x: x < 2]).apply(df) a b 2 4 5 3 5 11 >>> pdp.RowDrop({'a': lambda x: x == 4}).apply(df) a b 1 1 4 3 5 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.select.
ValDrop
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that drops rows by value.
- Parameters
values (list-like) – A list of the values to drop.
columns (str or list-like, default None) – The name, or an iterable of names, of columns to check for the given values. If set to None, all columns are checked.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[18,11]], [1,2,3], ['a','b']) >>> pdp.ValDrop([4], 'a').apply(df) a b 1 1 4 3 18 11 >>> pdp.ValDrop([4]).apply(df) a b 3 18 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.select.
ValKeep
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that keeps rows by value.
- Parameters
values (list-like) – A list of the values to keep.
columns (str or list-like, default None) – The name, or an iterable of names, of columns to check for the given values. If set to None, all columns are checked.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[5,11]], [1,2,3], ['a','b']) >>> pdp.ValKeep([4, 5], 'a').apply(df) a b 2 4 5 3 5 11 >>> pdp.ValKeep([4, 5]).apply(df) a b 2 4 5
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
Module contents¶
-
class
dalio.pipe.
PipeLine
(*args)¶ Bases:
dalio.pipe.pipe.Pipe
Collection of Pipe transformations.
PipeLine instances represent multiple Pipe transformations being performed consecutively. Pipelines essentially execute multiple transformations one after the other, and thus do not check for data integrity in between them; so keep in mind that order matters and only the first data definition will be enforced.
-
pipeline
¶ list of Pipe instaces this pipeline is composed of
- Type
list
-
copy
(*args, **kwargs)¶ Make a copy of this Pipeline
-
extend
(*args, deep=False)¶ Extend existing pipeline with one or more Pipe instances
Keep in mind that this will not mean that
-
transform
(data, **kwargs)¶ Pass data sourced from first pipe through every Pipe`s .transform() method in order.
- Parameters
data – data sourced and checked from first source.
-
-
class
dalio.pipe.
Custom
(func, *args, columns=None, new_cols=None, strategy='apply', axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Apply custom function.
-
strategy
¶ strategy for applying value function. One of [“apply”, “transform”, “agg”, “pipe”]
- Type
str, default “pipe”
Example
>>> import pandas as pd; from dalio.pipe import Custom; >>> data = [[3, 2143], [10, 1321], [7, 1255]] >>> df = pd.DataFrame(data, [1,2,3], ['years', 'avg_revenue']) >>> total_rev = lambda row: row['years'] * row['avg_revenue'] >>> add_total_rev = Custom(total_rev, 'total_revenue', axis=1) >>> add_total_rev.transform(df) years avg_revenue total_revenue 1 3 2143 6429 2 10 1321 13210 3 7 1255 8785 >>> def halfer(row): ... new = {'year/2': row['years']/2, ... 'rev/2': row['avg_revenue']/2} ... return pd.Series(new) >>> half_cols = Custom(halfer, axis=1, drop=False) >>> half_cols.transform(df) years avg_revenue rev/2 year/2 1 3 2143 1071.5 1.5 2 10 1321 660.5 5.0 3 7 1255 627.5 3.5
>>> data = [[3, 3], [2, 4], [1, 5]] >>> df = pd.DataFrame(data, [1,2,3], ["A","B"]) >>> func = lambda df: df['A'] == df['B'] >>> add_equal = Custom(func, "A==B", strategy="pipe", drop=False) >>> add_equal.transform(df) A B A==B 1 3 3 True 2 2 4 False 3 1 5 False
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.
Rolling
(func, *args, columns=None, new_cols=None, rolling_window=2, axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Apply rolling function
-
rolling_window
¶ rolling window to apply function. If none, no rolling window is applied.
- Type
int, defailt None
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.
ColSelect
(columns)¶ Bases:
dalio.pipe.select._ColSelection
Select columns
-
transform
(data, **kwargs)¶ Selects the specified columns or returns data as is if no column was specified.
- Returns
Data of the same format as before but only only containing the specified columns.
-
-
class
dalio.pipe.
DateSelect
(start=None, end=None)¶ Bases:
dalio.pipe.pipe.Pipe
Select a date range.
This is commonly left as a local variable to control date range being used at a piece of a graph.
-
_start
¶ start date.
- Type
pd.Timestamp
-
_end
¶ end date.
- Type
pd.Timestamp
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
set_end
(end)¶ Set the _end attribute
-
set_start
(start)¶ Set the _start attribute
-
transform
(data, **kwargs)¶ Slices time series into selected date range.
- Returns
Time series of the same format as input containing a subset of the original dates.
-
-
class
dalio.pipe.
ColDrop
(columns)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that drops columns by name.
- Parameters
columns (str, iterable or callable) – The label, or an iterable of labels, of columns to drop. Alternatively, columns can be assigned a callable returning bool values for pandas.Series objects; if this is the case, every column for which it return True will be dropped.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,'a'],[5,'b']], [1,2], ['num', 'char']) >>> pdp.ColDrop('num').apply(df) char 1 a 2 b
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.
ValDrop
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that drops rows by value.
- Parameters
values (list-like) – A list of the values to drop.
columns (str or list-like, default None) – The name, or an iterable of names, of columns to check for the given values. If set to None, all columns are checked.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[18,11]], [1,2,3], ['a','b']) >>> pdp.ValDrop([4], 'a').apply(df) a b 1 1 4 3 18 11 >>> pdp.ValDrop([4]).apply(df) a b 3 18 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.
ValKeep
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that keeps rows by value.
- Parameters
values (list-like) – A list of the values to keep.
columns (str or list-like, default None) – The name, or an iterable of names, of columns to check for the given values. If set to None, all columns are checked.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[5,11]], [1,2,3], ['a','b']) >>> pdp.ValKeep([4, 5], 'a').apply(df) a b 2 4 5 3 5 11 >>> pdp.ValKeep([4, 5]).apply(df) a b 2 4 5
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.
ColRename
(map_dict)¶ Bases:
dalio.pipe.pipe.Pipe
A pipeline stage that renames a column or columns.
-
rename_map
¶ Maps old column names to new ones.
- Type
dict
- Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,'a'],[5,'b']], [1,2], ['num', 'char']) >>> pdp.ColRename({'num': 'len', 'char': 'initial'}).apply(df) len initial 1 8 a 2 5 b
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.
DropNa
(**kwargs)¶ Bases:
dalio.pipe.pipe.Pipe
A pipeline stage that drops null values.
Supports all parameter supported by pandas.dropna function.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,None],[1,11]], [1,2,3], ['a','b']) >>> pdp.DropNa().apply(df) a b 1 1 4.0 3 1 11.0
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.
FreqDrop
(values, columns=None)¶ Bases:
dalio.pipe.select._ColValSelection
A pipeline stage that drops rows by value frequency.
- Parameters
threshold (int) – The minimum frequency required for a value to be kept.
column (str) – The name of the colums to check for the given value frequency.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b']) >>> pdp.FreqDrop(2, 'a').apply(df) a b 1 1 4 3 1 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.
ColReorder
(map_dict, level=0)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that reorders columns.
-
positions
¶ A mapping of column names to their desired positions after reordering Columns not included in the mapping will maintain their relative positions over the non-mapped colums.
- Type
dict
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[8,4,3,7]], columns=['a', 'b', 'c', 'd']) >>> pdp.ColReorder({'b': 0, 'c': 3}).apply(df) b a d c 0 4 8 7 3
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
-
class
dalio.pipe.
RowDrop
(conditions, columns=None, reduce_strat=None)¶ Bases:
dalio.pipe.select._ColSelection
A pipeline stage that drop rows by callable conditions.
- Parameters
conditions (list-like or dict) – The list of conditions that make a row eligible to be dropped. Each condition must be a callable that take a cell value and return a bool value. If a list of callables is given, the conditions are checked for each column value of each row. If a dict mapping column labels to callables is given, then each condition is only checked for the column values of the designated column.
reduce ('any', 'all' or 'xor', default 'any') – Determines how row conditions are reduced. If set to ‘all’, a row must satisfy all given conditions to be dropped. If set to ‘any’, rows satisfying at least one of the conditions are dropped. If set to ‘xor’, rows satisfying exactly one of the conditions will be dropped. Set to ‘any’ by default.
columns (str or iterable, optional) – The label, or an iterable of labels, of columns. Optional. If given, input conditions will be applied to the sub-dataframe made up of these columns to determine which rows to drop.
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1,4],[4,5],[5,11]], [1,2,3], ['a','b']) >>> pdp.RowDrop([lambda x: x < 2]).apply(df) a b 2 4 5 3 5 11 >>> pdp.RowDrop({'a': lambda x: x == 4}).apply(df) a b 1 1 4 3 5 11
-
transform
(data, **kwargs)¶ Apply a transformation to data returned from source.
This is where the bulk of funtionality in a Pipe lies. And allows it to be highly customizable. This will often be the only method needed to be overwriten in subclasses.
- Parameters
data – data returned by source.
-
class
dalio.pipe.
Change
(*args, strategy='diff', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Perform item-by-item change
This has two main forms, percentage change and absolute change (difference).
-
_strategy
¶ change strategy.
- Type
str, callable
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.
StockReturns
(columns=None, new_cols=None, drop=True, reintegrate=False)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Perform percent change and minor aesthetic changes to data
-
class
dalio.pipe.
Period
(period, *args, agg_func=<function mean>, columns=None, new_cols=None, axis=0, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
Resample input time series data to a different period
- Attributes:
agg_func (callable): function to aggregate data to one period.
- # Quandl Input
Default set to np.mean.
- _period (str): period to resample data to. Can be either daily,
monthly, quarterly or yearly.
-
agg_func
: Callable[[Iterable], Any] = None¶
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
class
dalio.pipe.
Index
(index_at, *args, columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation._ColGeneration
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
-
class
dalio.pipe.
Bin
(bin_map, *args, bin_strat='normal', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that adds a binned version of a column or columns.
If drop is set to True the new columns retain the names of the source columns; otherwise, the resulting column gain the suffix ‘_bin’
-
bin_map
¶ implicitly projects a left-most bin containing all elements smaller than the left-most end point and a right-most bin containing all elements larger that the right-most end point. For example, the list [0, 5, 8] is interpreted as the bins (-∞, 0), [0-5), [5-8) and [8, ∞).
- Type
array-like
-
bin_strat
¶ binning strategy to use. “normal” uses the default binning strategy per a list of value separations or number of bins. “quantile” uses a list of quantiles or a preset quantile range (4 for quartiles and 10 for deciles).
- Type
str, default “normal”
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[-3],[4],[5], [9]], [1,2,3, 4], ['speed']) >>> pdp.Bin({'speed': [5]}, drop=False).apply(df) speed speed_bin 1 -3 <5 2 4 <5 3 5 5≤ 4 9 5≤ >>> pdp.Bin({'speed': [0,5,8]}, drop=False).apply(df) speed speed_bin 1 -3 <0 2 4 0-5 3 5 5-8 4 9 8≤
-
-
class
dalio.pipe.
MapColVals
(value_map, *args, columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that reintegrates the values of a column by a map.
-
value_map
¶ A dictionary mapping existing values to new ones. Values not in the dictionary as keys will be converted to NaN. If a function is given, it is applied element-wise to given columns. If a Series is given, values are mapped by its index to its values.
- Type
dict, function or pandas.Series
Example
>>> import pandas as pd; import pdpipe as pdp; >>> df = pd.DataFrame([[1], [3], [2]], ['UK', 'USSR', 'US'], ['Medal']) >>> value_map = {1: 'Gold', 2: 'Silver', 3: 'Bronze'} >>> pdp.MapColVals('Medal', value_map).apply(df) Medal UK Gold USSR Bronze US Silver
-
-
class
dalio.pipe.
CustomByCols
(func, *args, strategy='apply', columns=None, new_cols=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage applying a function to individual columns iteratively.
-
func
¶ The function to be applied to each element of the given columns.
- Type
function
-
strategy
¶ Application strategy. Different from Custom class’ strategy parameter (which here is kept at “apply”) as this will now be done on a series (each column). Extra care should be taken to ensure resulting column lengths match.
- Type
str
Example
>>> import pandas as pd; import pdpipe as pdp; import math; >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]] >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"]) >>> round_ph = pdp.ApplyByCols("ph", math.ceil) >>> round_ph(df) ph lbl 1 4 acd 2 8 alk 3 13 alk
-
-
class
dalio.pipe.
Log
(*args, columns=None, new_cols=None, non_neg=False, const_shift=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that log-transforms numeric data.
-
non_neg
¶ If True, each transformed column is first shifted by smallest negative value it includes (non-negative columns are thus not shifted).
- Type
bool, default False
-
const_shift
¶ If given, each transformed column is first shifted by this constant. If non_neg is True then that transformation is applied first, and only then is the column shifted by this constant.
- Type
int, optional
Example
>>> import pandas as pd; import pdpipe as pdp; >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]] >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"]) >>> log_stage = pdp.Log("ph", drop=True) >>> log_stage(df) ph lbl 1 1.163151 acd 2 1.974081 alk 3 2.493205 alk
-
-
class
dalio.pipe.
BoxCox
(*args, columns=None, new_cols=None, non_neg=False, const_shift=None, drop=True, reintegrate=False, **kwargs)¶ Bases:
dalio.pipe.col_generation.Custom
A pipeline stage that applies the BoxCox transformation on data.
-
const_shift
¶ If given, each transformed column is first shifted by this constant. If non_neg is True then that transformation is applied first, and only then is the column shifted by this constant.
- Type
int, optional
-
-
class
dalio.pipe.
StockComps
(strategy='sic_code', max_ticks=6)¶ Bases:
dalio.pipe.pipe.Pipe
Get a list of a ticker’s comparable stocks
This can utilize any strategy of getting stock comparative companies and return up to a certain ammount of comps.
-
_strategy
¶ comparisson strategy name or function.
- Type
str, callable
-
max_ticks
¶ maximum number of tickers to return.
- Type
int
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
max_ticks
: int = None
-
run
(**kwargs)¶ Gets ticker argument and passes an empty ticker request to transform.
Empty ticker requests are supposed to return all tickers available in a source, so this allows the compariisson to be made in all stocks from a certain source.
- Raises
ValueError – if ticker is more than a single symbol.
-
transform
(data, **kwargs)¶ Get comps according to the set strategy
-
-
class
dalio.pipe.
CovShrink
(frequency=252)¶ Bases:
dalio.pipe.pipe.PipeBuilder
Perform Covariance Shrinkage on data
Builder with a single piece: shirnkage. Shrinkage defines what kind of shrinkage to apply on a resultant covariance matrix. If none is set, covariance will not be shrunk.
-
frequency
¶ data time period frequency
- Type
int
-
build_model
(data, **kwargs)¶ Builds Covariance Srhinkage object and returns selected shrinkage strategy
- Returns
Function fitted on the data.
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
frequency
: int = None
-
transform
(data, **kwargs)¶ Build model using data get results.
- Returns
A covariance matrix
-
-
class
dalio.pipe.
ExpectedReturns
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Get stock’s time series expected returns.
Builder with a single piece: return_model. return_model is what model to get the expected returns from.
-
build_model
(data, **kwargs)¶ Assemble pieces into a model given some data
The data will opten be optional, but several builder models will require it to be fitted on initialization. Which further shows why builders are necessary for context-agnostic graphs.
- Parameters
data – data that might be used to build the model.
**kwargs – any additional argument used in building
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
transform
(data, **kwargs)¶ Builds model using data and gets expected returns from it
-
-
class
dalio.pipe.
MakeARCH
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Build arch model and make it based on input data.
This class allows for the creation of arch models by configuring three pieces: the mean, volatility and distribution. These are set after initialization through the _Builder interface.
-
_piece
¶ see _Builder class.
- Type
list
-
assimilate
(model)¶ Assimilate core pieces of an existent ARCH Model.
Assimilation means setting this model’s’ pieces in accordance to an existing model’s pieces. Assimilation is shallow, so only the main pieces are assimilated, not their parameters.
- Parameters
model (ARCHModel) – Existing ARCH Model.
-
build_model
(data, **kwargs)¶ Build ARCH Model using data, set pieces and their arguments
- Returns
A built arch model from the arch package.
-
transform
(data, **kwargs)¶ Build model with sourced data
-
-
class
dalio.pipe.
ValueAtRisk
(quantiles=None)¶ Bases:
dalio.pipe.pipe.Pipe
Get the value at risk for data based on an ARHC Model
This takes in an ARCH Model maker, not data, which might be unintuitive, yet necessary, as this allows users to modify the ARCH model generating these values separately. A useful strategy that allows for this is using a pipeline with an arch model as its first input and a ValueAtRisk instance as its second layer. This allows us to treat the PipeLine as a data input with VaR output and still have control over the ARCH Model pieces (given you left a local variable for it behind.)
-
_quantiles
¶ list of quantiles to check the value at risk for.
- Type
list
-
copy
(*args, **kwargs)¶ Makes a copy of transformer, copying its attributes to a new instance.
This copy should essentially create a new transformation node, not an entire new graph, so the _source attribute of the returned instance should be assigned without being copied. This is also made to be built upon by subclasses, such that only new attributes need to be added to a class’ copy method.
- Parameters
*args – Positional arguments to be passed to initialize copy
**kwargs – Keyword arguments to be passed to initialize copy
- Returns
A copy of this _Transformer instance with copies of necessary attributes and empty input.
-
transform
(data, **kwargs)¶ Get values at risk at each quantile and each results maximum exedence from the mean.
The maximum exedence columns tells which quantile the loss is placed on. The word “maximum” might be misleading as it is compared to the minimum quantile, however, this definition is accurate as the column essentially answers the question: “what quantile furthest away from the mean does the data exeed?”
Thank you for the creators of the arch package for the beautiful visualizations and ideas!
- Raises
ValueError – if ARCH model does not have returns. This is often the case for unfitted models. Ensure your graph is complete.
TypeError – if ARCH model has unsuported distribution parameter.
-
-
class
dalio.pipe.
ExpectedShortfall
(quantiles=None)¶ Bases:
dalio.pipe.builders.ValueAtRisk
Get expected shortfal for given quantiles
See base class for more in depth explanation.
-
transform
(data, **kwargs)¶ Get the value at risk given by an arch model and calculate the expected shortfall at given quantiles.
-
-
class
dalio.pipe.
PandasLinearModel
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Create a linear model from input pandas dataframe, using its index as the X value.
This builder is made up of a single piece: strategy. This piece sets which linear model should be used to fit the data.
-
build_model
(data, **kwargs)¶ Build model by returning the chosen model and initialization parameters
- Returns
Unfitted linear model
-
transform
(data, **kwargs)¶ Set up fitting parameters and fit built model.
- Returns
Fitted linear model
-
-
class
dalio.pipe.
OptimumWeights
¶ Bases:
dalio.pipe.pipe.PipeBuilder
Get optimum portfolio weights from an efficient frontier or CLA. This is also a builder with one piece: strategy. The strategy piece refers to the optimization strategy.
-
build_model
(data, **kwargs)¶ Assemble pieces into a model given some data
The data will opten be optional, but several builder models will require it to be fitted on initialization. Which further shows why builders are necessary for context-agnostic graphs.
- Parameters
data – data that might be used to build the model.
**kwargs – any additional argument used in building
-
check_name
(param, name)¶ Check if name and parameter combination is valid.
This will always be called upon setting a new piece to ensure this piece is present dictionary and that the name is valid. Subclasses will often override this method to implement the name checks in accordance to their specific name parameter combination options. Notice that checks cannot be done on arguments before running the _Builder. This also can be called from outside of a _Builder instance to check for the validity of settings.
- Parameters
piece (str) – name of the key in the piece dictionary.
name (str) – name option to be set to the piece.
-
transform
(data, **kwargs)¶ Get efficient frontier, fit it to model and get weights
-