Bout analysis¶
Tools and classes for the identification of behavioural bouts
A histogram of log-transformed frequencies of x with a chosen bin width and upper limit forms the basis for models. Histogram bins following empty ones have their frequencies averaged over the number of previous empty bins plus one. Models attempt to discern the number of random Poisson processes, and their parameters, generating the underlying distribution of log-transformed frequencies.
The abstract class Bouts
provides basic methods.
Abstract class & methods summary¶
|
Abstract base class for models of log-transformed frequencies |
|
Find starting values for mixtures of random Poisson processes |
|
Fit Poisson mixture model to log frequencies |
|
Calculate bout ending criteria from model coefficients |
|
Plot log frequency histogram and fitted model |
Nonlinear least squares models¶
Currently, the model describing the histogram as it is built is implemented in the BoutsNLS class. For the case of a mixture of two Poisson processes, this class would set up the model:
where \(N_f\) and \(N_s\) are the number of events belonging to process \(f\) and \(s\), respectively; and \(\lambda_f\) and \(\lambda_s\) are the probabilities of an event occurring in each process. Mixtures of more processes can also be added to the model.
The bout-ending criterion (BEC) corresponding to equation (1) is:
Note that there is one BEC per transition between Poisson processes.
The methods of this subclass are provided by the abstract super class
Bouts
, and adds the methods below.
Methods summary¶
|
Plot observed and modelled empirical cumulative frequencies |
Maximum likelihood models¶
This is the preferred approach to modelling mixtures of random Poisson processes, as it does not rely on the subjective construction of a histogram. The histogram is only used to generate reasonable starting values, but the underlying paramters of the model are obtained via maximum likelihood, so it is more robust.
For the case of a mixture of two processes, as above, the log likelihood of all the \(N_t\) in a mixture can be expressed as:
where \(p\) is a mixing parameter indicating the proportion of fast to slow process events in the sampled population.
The BEC in this case can be estimated as:
The subclass BoutsMLE
offers the framework for these models.
Class & methods summary¶
|
Log likelihood function of parameters given observed data |
|
Maximum likelihood estimation of log frequencies |
|
Calculate bout ending criteria from model coefficients |
|
Plot log frequency histogram and fitted model |
|
Plot observed and modelled empirical cumulative frequencies |
API¶
-
class
bouts.
Bouts
(x, bw, method='standard')[source]¶ Abstract base class for models of log-transformed frequencies
This is a base class for other classes to build on, and do the model fitting. Bouts is an abstract base class to set up bout identification procedures. Subclasses must implement fit and bec methods, or re-use the default NLS methods in Bouts.
-
x
¶ 1D array with input data.
- Type
array_like
-
method
¶ Method used for calculating the histogram.
- Type
str
-
lnfreq
¶ DataFrame with the centers of histogram bins, and corresponding log-frequencies of x.
- Type
pandas.DataFrame
-
abstract
bec
(coefs)[source]¶ Calculate bout ending criteria from model coefficients
Implementing default as from NLS method.
- Parameters
coefs (pandas.DataFrame) – DataFrame with model coefficients in columns, and indexed by parameter names “a” and “lambda”.
- Returns
out – 1-D array with BECs implied by coefs. Length is coefs.shape[1]
- Return type
ndarray, shape (n,)
-
abstract
fit
(start)[source]¶ Fit Poisson mixture model to log frequencies
Default is non-linear least squares method.
- Parameters
start (pandas.DataFrame) – DataFrame with coefficients for each process in columns.
- Returns
coefs (pandas.DataFrame) – Coefficients of the model.
pcov (2D array) – Covariance of coefs.
-
init_pars
(x_break, plot=True, ax=None, **kwargs)[source]¶ Find starting values for mixtures of random Poisson processes
Starting values are calculated using the “broken stick” method.
- Parameters
x_break (array_like) – One- or two-element array with values determining the break(s) for broken stick model, such that x < x_break[0] is first process, x >= x_break[1] & x < x_break[2] is second process, and x >= x_break[2] is third one.
plot (bool, optional) – Whether to plot the broken stick model.
ax (matplotlib.Axes, optional) – An Axes instance to use as target. Default is to create one.
**kwargs (optional keyword arguments) – Passed to plotting function.
- Returns
out – DataFrame with coefficients for each process.
- Return type
pandas.DataFrame
-
plot_fit
(coefs, ax=None)[source]¶ Plot log frequency histogram and fitted model
- Parameters
coefs (pandas.DataFrame) – DataFrame with model coefficients in columns, and indexed by parameter names “a” and “lambda”.
ax (matplotlib.Axes instance) – An Axes instance to use as target.
- Returns
ax
- Return type
matplotlib.Axes
-
-
class
bouts.
BoutsMLE
(x, bw, method='standard')[source]¶ Nonlinear least squares bout identification
-
bec
(fit)[source]¶ Calculate bout ending criteria from model coefficients
- Parameters
fit (scipy.optimize.OptimizeResult) – Object with the optimization result, having a x attribute with coefficients of the solution.
- Returns
out
- Return type
ndarray
Notes
Current implementation is for a two-process mixture, hence an array of a single float is returned.
-
fit
(start, fit1_opts=None, fit2_opts=None)[source]¶ Maximum likelihood estimation of log frequencies
- Parameters
start (pandas.DataFrame) – DataFrame with starting values for coefficients of each process in columns. These can come from the “broken stick” method as in
Bouts.init_pars()
, and will be transformed to minimize the first log likelihood function.fit2_opts (fit1_opts,) – Dictionaries with keywords to be pass to
scipy.optimize.minimize()
, for the first and second fits.
- Returns
fit1, fit2 – Objects with the optimization result from the first and second fit, having a x attribute with coefficients of the solution.
- Return type
scipy.optimize.OptimizeResult
Notes
Current implementation handles mixtures of two Poisson processes.
-
loglik_fun
(params, x, transformed=True)[source]¶ Log likelihood function of parameters given observed data
- Parameters
params (array_like) – 1-D array with parameters to fit. Currently must be 3-length, with mixing parameter \(p\), density parameter \(\lambda_f\) and \(\lambda_s\), in that order.
x (array_like) – Independent data array described by parameters p and lambdas.
transformed (bool) – Whether params are transformed and need to be un-transformed to calculate the likelihood.
- Returns
- Return type
out
-
plot_ecdf
(fit, ax=None)[source]¶ Plot observed and modelled empirical cumulative frequencies
- Parameters
fit (scipy.optimize.OptimizeResult) – Object with the optimization result, having a x attribute with coefficients of the solution.
ax (matplotlib.Axes instance) – An Axes instance to use as target.
- Returns
ax
- Return type
matplotlib.Axes
-
plot_fit
(fit, ax=None)[source]¶ Plot log frequency histogram and fitted model
- Parameters
fit (scipy.optimize.OptimizeResult) – Object with the optimization result, having a x attribute with coefficients of the solution.
ax (matplotlib.Axes instance) – An Axes instance to use as target.
- Returns
ax
- Return type
matplotlib.Axes
-
-
class
bouts.
BoutsNLS
(x, bw, method='standard')[source]¶ Nonlinear least squares bout identification
-
bec
(coefs)[source]¶ Calculate bout ending criteria from model coefficients
The metaclass
bouts.Bouts
implements this method.- Parameters
coefs (pandas.DataFrame) – DataFrame with model coefficients in columns.
- Returns
out – List of BEC’s implied by coefs.
- Return type
list
-
fit
(start)[source]¶ Fit non-linear least squares to log frequencies
The metaclass
bouts.Bouts
implements this method.- Parameters
start (pandas.DataFrame) – DataFrame with coefficients for each process in columns.
- Returns
coefs (pandas.DataFrame) – Coefficients of the model.
pcov (2D array) – Covariance of coefs.
-
plot_ecdf
(coefs, ax=None, **kwargs)[source]¶ Plot observed and modelled empirical cumulative frequencies
- Parameters
coefs (pandas.DataFrame) – DataFrame with model coefficients in columns.
ax (matplotlib.Axes instance) – An Axes instance to use as target.
**kwargs (optional keyword arguments) – Passed to matplotlib.pyplot.gca.
- Returns
ax
- Return type
matplotlib.Axes
-
-
bouts.
label_bouts
(x, bec, as_diff=False)[source]¶ Classify data into bouts based on bout ending criteria
- Parameters
x (pandas.Series) – Series with data to classify according to bec.
bec (array_like) – Array with bout-ending criteria. It is assumed to be sorted.
as_diff (bool, optional) – Whether to apply diff on x so it matches bec’s scale.
- Returns
out – Integer array with the same shape as x.
- Return type
ndarray