autocpd package
Submodules
autocpd.neuralnetwork module
- autocpd.neuralnetwork.compile_and_fit(model, x_train, y_train, batch_size, lr, name, log_dir, epochdots, optimizer=None, validation_split=0.2, max_epochs=10000)[source]
To compile and fit the model
Parameters
- modelModels object
the simple neural network
- x_traintf.Tensor
the tensor of training data
- y_traintf.Tensor
the tensor of training data, label
- batch_sizeint
the batch size
- lrfloat
the learning rate
- namestr
the model name
- log_dirstr
the path of log files
- epochdotsobject
the EpochDots object from tensorflow_docs
- optimizeroptimizer object or str, optional
the optimizer, by default None
- max_epochsint, optional
the maximum number of epochs, by default 10000
Returns
- model.fit object
a fitted model object
- autocpd.neuralnetwork.deep_nn(n, n_trans, kernel_size, n_filter, dropout_rate, n_classes, m, l, model_name='deep_nn')[source]
This function is used to construct the deep neural network with 21 residual blocks.
Parameters
- nint
the length of time series
- n_transint
the number of transformations
- kernel_sizeint
the kernel size
- n_filterint
the filter size
- dropout_ratefloat
the dropout rate
- n_classesint
the number of classes
- marray
the width vector
- lint
the number of dense layers
- model_namestr, optional
the model name, by default “deep_nn”
Returns
- model
the model of deep neural network
- autocpd.neuralnetwork.general_deep_nn(n, n_trans, kernel_size, n_filter, dropout_rate, n_classes, n_resblock, m, l, model_name='deep_nn')[source]
This function is used to construct the deep neural network with 21 residual blocks.
Parameters
- nint
the length of time series
- n_transint
the number of transformations
- kernel_sizeint
the kernel size
- n_filterint
the filter size
- dropout_ratefloat
the dropout rate
- n_classesint
the number of classes
- n_resnetint
the number of residual blocks
- marray
the width vector
- lint
the number of dense layers
- model_namestr, optional
the model name, by default “deep_nn”
Returns
- model
the model of deep neural network
- autocpd.neuralnetwork.general_simple_nn(n, l, m, num_classes, model_name='simple_nn')[source]
To construct a simple neural network.
Parameters
- nscalar
the input size
- lscalar
the number of hidden layers
- mscalar or 1D array
the width vector of hidden layers, if it is a scalar, then the hidden layers of simple neural network have the same nodes.
- num_classesscalar
the nodes of output layers, i.e., the number of classes
- model_namestr, optional
the model name, by default “simple_nn”
Returns
- model
the simple neural network
- autocpd.neuralnetwork.get_callbacks(name, log_dir, epochdots)[source]
Get callbacks. This function returns the result of epochs during training, if it satisfies some conditions then the training can stop early. At meanwhile, this function also save the results of training in TensorBoard and csv files.
Parameters
- namestr
the model name
- log_dirstr
the path of log files
- epochdotsobject
the EpochDots object from tensorflow_docs
Returns
- list
the list of callbacks
autocpd.pre_trained_model module
autocpd.utils module
- autocpd.utils.ComputeCUSUM(x)[source]
Compute the CUSUM statistics with O(n) time complexity
Parameters
- xvector
the time series
Returns
- vector
a: the CUSUM statistics vector.
- autocpd.utils.ComputeMeanVarNorm(x, minseglen=2)[source]
Compute the likelihood for change in variance. Rewritten by the R function single.var.norm.calc() in package changepoint.
Parameters
- xnumpy array
the time series
- minseglenint
the minimum length of segment
Returns
- scalar
the likelihood ratio
- autocpd.utils.ComputeMosum(x, G)[source]
Compute the mosum statistic, rewritten according to mosum.stat function in mosum R package.
Parameters
- xnumpy array
The time series
- Gscalar
the width of moving window
Returns
- int
the location of maximum mosum statistics
- autocpd.utils.DataGenAlternative(N_sub, B, mu_L, n, B_bound, ARcoef=0.0, tau_bound=2, ar_model='Gaussian', scale=0.1, sigma=1.0)[source]
This function genearates the simulation data from alternative model of change in mean.
Parameters
- N_subint
The sample size of simulation data.
- Bfloat
The signal-to-noise ratio of parameter space.
- mu_Lfloat
The single at the left of change point.
- nint
The length of time series.
- B_boundlist, optional
The upper and lower bound scalars of signal-to-noise.
- ARcoeffloat, optional
The autoregressive parameter of AR(1) model, by default 0.0
- tau_boundint, optional
The lower bound of change point, by default 2
- ar_modelstr, optional
The different models, by default ‘Gaussian’. ar_model=”AR0” means AR(1) noise with autoregressive parameter ‘ARcoef’; ar_model=”ARH” means Cauchy noise with scale parameter ‘scale’; ar_model=”ARrho” means AR(1) noise with random autoregressive parameter ‘scale’;
- scalefloat, optional
The scale parameter of Cauchy distribution, by default 0.1
- sigmafloat, optional
The standard variance of normal distribution, by default 1.0
Returns
- dict
data: size (N_sub,n); tau_alt: size (N_sub,); the change points mu_R: size (N_sub,); the single at the right of change point
- autocpd.utils.DataGenScenarios(scenario, N, B, mu_L, n, B_bound, rho, tau_bound)[source]
This function generates the data based on Scenarios 1, a and 3 in “Automatic Change-point Detection in Time Series via Deep Learning” (Jie et al. ,2023)
Parameters
- scenariostring
the scenario label: ‘A0’ is the Scenarios 1 with ‘rho=0’, ‘A07’ is the Scenarios 1 with ‘rho=0.7’, ‘C’ is the Scenarios 2 and ‘D’ is the Scenarios 3 with heavy tailed noise.
- Nint
the sample size
- Bfloat
The signal-to-noise ratio of parameter space.
- mu_Lfloat
The single at the left of change point.
- nint
The length of time series.
- B_boundlist, optional
The upper and lower bound scalars of signal-to-noise.
- rhoscalar
the autocorrelation of AR(1) model
- tau_boundint, optional
The lower bound of change point, by default 2
Returns
- dict
data_all: the time series; y_all: the label array.
- autocpd.utils.DataSetGen(N_sub, n, mean_arg, var_arg, slope_arg, n_trim, seed=2022)[source]
This function generates the simulation dataset for change in mean, in variance and change in non-zero slope. For more details, see Table S1 in supplement of “Automatic Change-point Detection in Time Series via Deep Learning” (Jie et al. ,2023)
Parameters
- N_subint
the sample size of each class
- nint
the length of time series
- mean_argarray
the hyperparameters for generating data of change in mean and null
- var_argarray
the hyperparameters for generating data of change in variance and null
- slope_argarray
the hyperparameters for generating data of change in slope and null
- n_trimint
the trim size
- seedint, optional
the random seed, by default 2022
Returns
- dictionary
the simulation data and corresponding changes
- autocpd.utils.ExtractSubject(subject_path, length, size)[source]
To extract the null labels without change-points from one subject
Parameters
- subject_pathstring
the path of subject data
- lengthint
the length of extracted time series
- sizeint
the sample size
Returns
- dict
ts: time series; label: the labels.
- autocpd.utils.GenDataMean(N, n, cp, mu, sigma)[source]
The function generates the data for change in mean with Gaussian noise. When “cp” is None, it generates the data without change point.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- mufloat
the piecewise mean
- sigmafloat
the standard deviation of Gaussian distribution
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenDataMeanAR(N, n, cp, mu, sigma, coef)[source]
The function generates the data for change in mean with AR(1) noise. When “cp” is None, it generates the data without change point.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- mufloat
the piecewise mean
- sigmafloat
the standard deviation of Gaussian innovations in AR(1) noise
- coeffloat scalar
the coefficients of AR(1) model
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenDataMeanARH(N, n, cp, mu, coef, scale)[source]
The function generates the data for change in mean + Cauchy noise with location parameter 0 and scale parameter ‘scale’. When “cp” is None, it generates the data without change point.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- mufloat
the piecewise mean
- coeffloat array
the coefficients of AR(1) model
- scalethe scale parameter of Cauchy distribution
the coefficients of AR(1) model
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenDataMeanARrho(N, n, cp, mu, sigma)[source]
The function generates the data for change in mean with AR(1) noise. The autoregressive coefficient is generated from standard uniform distribution. When “cp” is None, it generates the data without change point.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- mufloat
the piecewise mean
- sigmafloat
the standard variance of normal distribution
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenDataSlope(N, n, cp, slopes, sigma, start)[source]
The function generates the data for change in slope with Gaussian noise. When “cp” is None, it generates the data without change point in slope.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- slopesfloat
the slopes before and after the change point
- sigmafloat
the standard deviation of Gaussian distribution
- startfloat
the y-intercept of linear model
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenDataVariance(N, n, cp, mu, sigma)[source]
The function generates the data for change in variance with piecewise constant signal. When “cp” is None, it generates the data without change point in variance.
Parameters
- Nint
the sample size
- nint
the length of time series
- cpint
the change point, only 1 change point is accepted in this function.
- mufloat
the piecewise mean
- sigmafloat
the standard deviation of Gaussian distribution
Returns
- numpy array
2D array with size (N, n)
- autocpd.utils.GenerateAR(n, coef_left, coef_right, tau, sigma)[source]
This function generates the signal of AR(1) model
Parameters
- ninteger
The length of time series
- coef_leftfloat
The AR coefficient before the change-point
- coef_rightfloat
The AR coefficient after the change-point
- tauinteger
The location of change-point
- sigmafloat
The standard deviation of noise
Returns
- array
The time series with length n.
- autocpd.utils.GenerateARAll(N, n, coef_left, coef_right, sigma, tau_bound)[source]
This function generates N the AR(1) signal
Parameters
- Ninteger
The number of observations
- ninteger
_description_
- coef_leftfloat
The AR coefficient before the change-point
- coef_rightfloat
The AR coefficient after the change-point
- sigmafloat
The standard deviation of noise
- tau_boundinteger
The bound of change-point
Returns
- 2D arrary and change-points
dataset with size (2*N, n), N change-points
- autocpd.utils.MaxCUSUM(x)[source]
To return the maximum of CUSUM
Parameters
- xvector
the time series
Returns
- scalar
the maximum of CUSUM
- autocpd.utils.Standardize(data)[source]
Data standardization
Parameters
- datanumpy array
the data set with size (N, …, n)
Returns
- data
standardized data
- autocpd.utils.Transform2D(data_y, rescale=False, cumsum=False)[source]
Apply 4 transformations (original, squared, log squared, tanh) to the same dataset
Parameters
- data_ynumpy array
the 2-D array
- rescalelogical bool
default False
- cusumlogical bool
replace tanh transformation with cusum transformation, default False
Returns
- numpy array
3-D arrary with size (N, 4, n)
- autocpd.utils.Transform2D2TR(data_y, rescale=False, times=2)[source]
Apply 2 transformations (original, squared) to the same dataset, each transformation is repeated user-specified times.
Parameters
- data_ynumpy array
the 2-D array
- rescalelogical bool
default False
- timesinteger
the number of repetitions
Returns
- numpy array
3-D arrary with size (N, 2*times, n)
- autocpd.utils.extract(n1, n2, length, size, ntrim)[source]
This function randomly extracts samples (consecutive segments) with length ‘length’ from a time series concatenated by two different time series with length ‘n1’ and ‘n2’ respectively. Argument ‘ntrim’ controls the minimum distance between change-point and start or end point of consecutive segment. It returns a dictionary containing two arrays: cp and sample. cp is an array of change points. sample is a 2D array where each row is the indices of consecutive segment .
Parameters
- n1the length of signal before change-point
_description_
- n2int
the length of time series after change-point
- lengthint
the length of time series segment that we want to extract
- sizeint
the sample size
- ntrimint
the number of observations to be trimmed before and after the change-point
Returns
- dict
‘cp’ is the set of change-points. ‘sample’ is a matrix of indices
- autocpd.utils.get_asyvar_window(x, momentp=1)[source]
This function computes the asymptotic variance of long run dependence time series using “window” method. This function is translated from the R function “asymvar.window”. This function is already been tested by letting “overlapping=F”,”obs=”ranks”.
Parameters
- x1D array
The time series
- momentpint, optional
which centred mean should be used, see Peligrad and Shao (1995) for details, by default 1
Returns
- scalar
The asymptotic variance of time series.
- autocpd.utils.get_cusum_location(x)[source]
This function return the estimation of change-point location based on CUSUM.
Parameters
- xnumpy array
The time series
Returns
- int
change-point location
- autocpd.utils.get_key(y_pred, label_dict)[source]
To get the labels according to the predict value
Parameters
- y_predint
the value of prediction
- label_dictdict
the lable dictionary
Returns
- list
the label list
- autocpd.utils.get_label(model, x_test, n)[source]
This function gets the predicted label for the testing time series:x_test
Parameters
- modeltensorflow model
The trained tensorflow model
- x_testvector
The vector of time series
- nint
The width of moving window
Returns
- arrays
two arrays, one is predicted label, the other is probabilities.
- autocpd.utils.get_label_hasc(model, x_test, label_dict)[source]
This function gets the predicted label for the HASC data
Parameters
- modeltensorflow model
The trained tensorflow model
- x_test2D array
The array of test dataset
- label_dictdict
The label dictionary
Returns
- arrays
two arrays, one is predicted label, the other is probabilities.
- autocpd.utils.get_loc_3(model, x_test, n, width)[source]
This function obtains locations of methods: NN, double mosum based on predicted label and probabilities.
Parameters
- modelmodel
The trained model
- x_testvector
The vector of time series
- nint
The length of x_test
- widthint
The width of second moving window.
Returns
- array
3 locations.
- autocpd.utils.get_mosum_loc_double(x, n, width, use_prob)[source]
This function return the estimation of change-point based on MOSUM by second moving average.
Parameters
- xarray
either the predicted labels or probabilities
- nint
The width of moving window
Returns
- int
change-point location
- autocpd.utils.get_mosum_loc_nn(pred, n)[source]
This function return the estimation of change-point based on MOSUM using NN.
Parameters
- predvector
The vector of predicted labels
- nint
The width of moving window
Returns
- int
change-point location
- autocpd.utils.get_wilcoxon_test(x)[source]
Compute the Wilcoxon statistics
Parameters
- xarray
the time series
Returns
- scalar
the maximum Wilcoxon statistics
- autocpd.utils.labelSubject(subject_path, length, size, num_trim=100)[source]
obtain the transition labels, change-points and time series from one subject.
Parameters
- subject_pathstring
the path of subject data
- lengthint
the length of extracted time series
- sizeint
the sample size
- num_trimint, optional
the number of observations to be trimmed before and after the change-point, by default 100
Returns
- dictionary
cp: the change-points; ts: time series; label: the transition labels.
- autocpd.utils.labelTransition(data, label, ind, length, size, num_trim=100)[source]
get the transition labels, change-points and time series from one subject
Parameters
- dataDataFrame
the time series.
- labelDataFrame
the states of the subject
- indscalar
the index of state
- lengthint
the length of extracted time series
- sizeint
the sample size
- num_trimint, optional
the number of observations to be trimmed before and after the change-point, by default 100
Returns
- dictionary
cp: the change-points; ts: time series; label: the transition labels.
- autocpd.utils.seqPlot(sequences_list, cp_list, label_list, y_pos=0.93)[source]
This function plots the sequence given change-points and label list.
Parameters
- sequences_listDataFrame
the time series
- cp_listlist
the list of change-point
- label_listlist
the list of labels
- y_posfloat, optional
the position of y, used in matplotlib, by default 0.93
- autocpd.utils.tsExtract(data_trim, new_label, length, size, len0)[source]
To extract the labels without change-points
Parameters
- data_trimDataFrame
the dataset of one specific state
- new_labelDataFrame
the label, not transition label.
- lengthint
the length of extracted time series
- sizeint
the sample size
- len0int
the length of time series for one specific state
Returns
- dict
ts: time series; label: the labels.
- autocpd.utils.wilcoxon(x)[source]
This function implements the Wilcoxon cumulative sum statistic (Dehling et al, 2013, Eq (20)) for nonparametric change point detection. The following code is translated from the C function “wilcoxsukz” in R package “robts”. The accuracy of this function is already been tested.
Parameters
- xarray
time series
Returns
- 1D array
the test statistic for each potential change point.