Bout analysis¶
Here is a brief demo on bout analysis with skdiveMove.bouts for data generated by mixtures of random Poisson processes.
Set up the environment. Consider loading the logging module and setting up a logger to monitor progress to this section.
# Set up
import os
import os.path as osp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skdiveMove.tests import diveMove2skd
import skdiveMove.bouts as skbouts
# For figure sizes
_FIG1X1 = (7, 6)
_FIG1X2 = (12, 5)
_FIG3X1 = (11, 11)
pd.set_option("display.precision", 3)
np.set_printoptions(precision=3, sign="+")
%matplotlib inline
Calculate postdive duration¶
Create a TDR
object to easily calculate the necessary statistics:
tdrX = diveMove2skd()
pars = {"offset_zoc": 3,
"dry_thr": 70,
"wet_thr": 3610,
"dive_thr": 3,
"dive_model": "unimodal",
"smooth_par": 0.1,
"knot_factor": 20,
"descent_crit_q": 0.01,
"ascent_crit_q": 0}
tdrX.calibrate(zoc_method="offset", offset=pars["offset_zoc"],
dry_thr=pars["dry_thr"], wet_thr=pars["dry_thr"],
dive_thr=pars["dive_thr"],
dive_model=pars["dive_model"],
smooth_par=pars["smooth_par"],
knot_factor=pars["knot_factor"],
descent_crit_q=pars["descent_crit_q"],
ascent_crit_q=pars["ascent_crit_q"])
stats = tdrX.dive_stats()
stamps = tdrX.stamp_dives(ignore_z=True)
stats_tab = pd.concat((stamps, stats), axis=1)
stats_tab.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 426 entries, 1 to 426
Data columns (total 49 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 phase_id 426 non-null int64
1 beg 426 non-null datetime64[ns]
2 end 426 non-null datetime64[ns]
3 begdesc 426 non-null datetime64[ns]
4 enddesc 426 non-null datetime64[ns]
5 begasc 426 non-null datetime64[ns]
6 desctim 426 non-null float64
7 botttim 144 non-null float64
8 asctim 426 non-null float64
9 divetim 426 non-null float64
10 descdist 426 non-null float64
11 bottdist 144 non-null float64
12 ascdist 426 non-null float64
13 bottdep_mean 144 non-null float64
14 bottdep_median 144 non-null float64
15 bottdep_sd 144 non-null float64
16 maxdep 426 non-null float64
17 desc_tdist 242 non-null float64
18 desc_mean_speed 242 non-null float64
19 desc_angle 206 non-null float64
20 bott_tdist 144 non-null float64
21 bott_mean_speed 144 non-null float64
22 asc_tdist 205 non-null float64
23 asc_mean_speed 205 non-null float64
24 asc_angle 185 non-null float64
25 descD_mean 426 non-null float64
26 descD_std 426 non-null float64
27 descD_min 426 non-null float64
28 descD_25% 426 non-null float64
29 descD_50% 426 non-null float64
30 descD_75% 426 non-null float64
31 descD_max 426 non-null float64
32 bottD_mean 144 non-null float64
33 bottD_std 144 non-null float64
34 bottD_min 144 non-null float64
35 bottD_25% 144 non-null float64
36 bottD_50% 144 non-null float64
37 bottD_75% 144 non-null float64
38 bottD_max 144 non-null float64
39 ascD_mean 426 non-null float64
40 ascD_std 426 non-null float64
41 ascD_min 426 non-null float64
42 ascD_25% 426 non-null float64
43 ascD_50% 426 non-null float64
44 ascD_75% 426 non-null float64
45 ascD_max 426 non-null float64
46 postdive_dur 426 non-null timedelta64[ns]
47 postdive_tdist 426 non-null float64
48 postdive_mean_speed 426 non-null float64
dtypes: datetime64[ns](5), float64(42), int64(1), timedelta64[ns](1)
memory usage: 166.4 KB
Extract postdive duration for further analysis.
postdives = stats_tab["postdive_dur"][stats_tab["phase_id"] == 4]
postdives_diff = postdives.dt.total_seconds().diff()[1:].abs()
# Remove isolated dives
postdives_diff = postdives_diff[postdives_diff < 2000]
Non-linear least squares via “broken-stick” model¶
skdiveMove provides the BoutsNLS
class for fitting non-linear
least squares (NLS) models to a modified histogram of a given variable.
The first step is to generate a modified histogram of postdive duration, and this requires choosing the bin width for the histogram.
postdives_nlsbouts = skbouts.BoutsNLS(postdives_diff, 0.1)
print(postdives_nlsbouts)
Class BoutsNLS object
histogram method: standard
log-frequency histogram:
x lnfreq
count 50.00 50.000
mean 284.54 -3.684
std 376.22 2.632
min 0.05 -8.216
25% 42.55 -5.521
50% 122.55 -3.912
75% 328.80 -2.798
max 1449.95 3.258
Two-process model¶
Assuming a 2-process model, calculate starting values, providing a guess at 50 s interdive interval.
fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars2 = postdives_nlsbouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model.
coefs2, pcov2 = postdives_nlsbouts.fit(init_pars2)
# Coefficients
print(coefs2)
(0.049, 50.0] (50.0, 1449.95]
a 41.686 8.108
lambda 0.115 0.003
# Covariance between parameters
print(pcov2)
[[+4.375e+02 +4.039e-01 +1.464e+00 +3.388e-04]
[+4.039e-01 +1.110e-03 +1.139e-02 +2.793e-06]
[+1.464e+00 +1.139e-02 +3.759e+00 +1.511e-04]
[+3.388e-04 +2.793e-06 +1.511e-04 +3.574e-07]]
Calculate bout-ending criterion.
# `bec` returns ndarray, and we have only one here
print("bec = {[0]:.2f}".format(postdives_nlsbouts.bec(coefs2)))
bec = 47.50
Plot the fit.
fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs2, ax=ax);

Three-process model¶
Attempt to discern three processes in the data.
fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars3 = postdives_nlsbouts.init_pars([50, 550], plot=True, ax=ax)

Fit three-process model.
coefs3, pcov3 = postdives_nlsbouts.fit(init_pars3)
# Coefficients
print(coefs3)
(0.049, 50.0] (50.0, 550.0] (550.0, 1449.95]
a 43.033 5.836 3.732
lambda 0.136 0.011 0.001
# Covariance between parameters
print(pcov3)
[[+5.457e+02 +5.911e-01 +1.012e+00 -5.469e-05 -8.834e-02 -1.191e-04]
[+5.911e-01 +2.317e-03 +2.922e-02 +1.217e-04 +9.120e-03 +1.012e-05]
[+1.012e+00 +2.922e-02 +6.778e+00 +4.515e-04 -8.611e-01 -1.570e-03]
[-5.469e-05 +1.217e-04 +4.515e-04 +6.806e-05 +6.797e-03 +8.297e-06]
[-8.834e-02 +9.120e-03 -8.611e-01 +6.797e-03 +2.934e+00 +6.962e-04]
[-1.191e-04 +1.012e-05 -1.570e-03 +8.297e-06 +6.962e-04 +2.382e-06]]
Plot the fit.
fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs3, ax=ax);

Compare the cumulative frequency distributions of two- vs three-process models.
fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
postdives_nlsbouts.plot_ecdf(coefs3, ax=axs[1]);

The three-process model does not seem appropriate.
Maximum likelihood estimation¶
Another way to model Poisson mixtures that does not rely on the
subjectively created histogram, and involves fewer parameters, requires
fitting via maximum likelihood estimation (MLM). This approach is available
in BoutsMLE
.
Set up an instance.
postdives_mlebouts = skbouts.BoutsMLE(postdives_diff, 0.1)
print(postdives_mlebouts)
Class BoutsMLE object
histogram method: standard
log-frequency histogram:
x lnfreq
count 50.00 50.000
mean 284.54 -3.684
std 376.22 2.632
min 0.05 -8.216
25% 42.55 -5.521
50% 122.55 -3.912
75% 328.80 -2.798
max 1449.95 3.258
Again, assuming a 2-process model, calculate starting values.
fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars = postdives_mlebouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model. It is important, but optional, to supply reasonable bounds to help the optimization algorithm. Otherwise, the algorithm may fail to converge. The fitting procedure is done in two steps: with and without a reparameterized log-likelihood function. Therefore, there are two sets of bounds required.
p_bnd = (-2, None) # bounds for `p`
lda1_bnd = (-5, None) # bounds for `lambda1`
lda2_bnd = (-10, None) # bounds for `lambda2`
bnd1 = (p_bnd, lda1_bnd, lda2_bnd)
p_bnd = (1e-8, None)
lda1_bnd = (1e-8, None)
lda2_bnd = (1e-8, None)
bnd2 = (p_bnd, lda1_bnd, lda2_bnd)
fit1, fit2 = postdives_mlebouts.fit(init_pars,
fit1_opts=dict(method="L-BFGS-B",
bounds=bnd1),
fit2_opts=dict(method="L-BFGS-B",
bounds=bnd2))
# First fit
print(fit1)
fun: 917.8524699075529
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([+0., -0., +0.])
message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 40
nit: 8
njev: 10
status: 0
success: True
x: array([+0.826, -2.69 , -5.629])
# Second fit
print(fit2)
fun: 917.8524699073647
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([+0.001, +0.001, +0.003])
message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 16
nit: 1
njev: 4
status: 0
success: True
x: array([+0.696, +0.068, +0.004])
Calculate bout-ending criterion (BEC).
print("bec = {:.2f}".format(postdives_mlebouts.bec(fit2)))
bec = 58.55
Plot the fit.
fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_mlebouts.plot_fit(fit2, ax=ax);

Compare the cumulative frequency distribution between NLS and MLM model estimates.
fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
axs[0].set_title("NLS")
postdives_mlebouts.plot_ecdf(fit2, ax=axs[1])
axs[1].set_title("MLM");

Label bouts based on BEC from the last MLM model. Note that Timedelta type needs to be converted to total seconds to allow comparison with BEC.
bec = postdives_mlebouts.bec(fit2)
skbouts.label_bouts(postdives.dt.total_seconds(), bec, as_diff=True)
236 1
237 2
238 3
239 4
240 4
..
422 51
423 51
424 51
425 52
426 52
Name: postdive_dur, Length: 191, dtype: int64
Feel free to download a copy of this demo (boutsdemo.py).