Bout analysis

Here is a brief demo on bout analysis with skdiveMove.bouts for data generated by mixtures of random Poisson processes.

Set up the environment. Consider loading the logging module and setting up a logger to monitor progress to this section.

# Set up
import os
import os.path as osp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skdiveMove.tests import diveMove2skd
import skdiveMove.bouts as skbouts

# For figure sizes
_FIG1X1 = (7, 6)
_FIG1X2 = (12, 5)
_FIG3X1 = (11, 11)

pd.set_option("display.precision", 3)
np.set_printoptions(precision=3, sign="+")
%matplotlib inline

Calculate postdive duration

Create a TDR object to easily calculate the necessary statistics:

tdrX = diveMove2skd()
pars = {"offset_zoc": 3,
        "dry_thr": 70,
        "wet_thr": 3610,
        "dive_thr": 3,
        "dive_model": "unimodal",
        "smooth_par": 0.1,
        "knot_factor": 20,
        "descent_crit_q": 0.01,
        "ascent_crit_q": 0}

tdrX.calibrate(zoc_method="offset", offset=pars["offset_zoc"],
               dry_thr=pars["dry_thr"], wet_thr=pars["dry_thr"],
               dive_thr=pars["dive_thr"],
               dive_model=pars["dive_model"],
               smooth_par=pars["smooth_par"],
               knot_factor=pars["knot_factor"],
               descent_crit_q=pars["descent_crit_q"],
               ascent_crit_q=pars["ascent_crit_q"])
stats = tdrX.dive_stats()
stamps = tdrX.stamp_dives(ignore_z=True)
stats_tab = pd.concat((stamps, stats), axis=1)
stats_tab.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 426 entries, 1 to 426
Data columns (total 49 columns):
 #   Column               Non-Null Count  Dtype          
---  ------               --------------  -----          
 0   phase_id             426 non-null    int64          
 1   beg                  426 non-null    datetime64[ns] 
 2   end                  426 non-null    datetime64[ns] 
 3   begdesc              426 non-null    datetime64[ns] 
 4   enddesc              426 non-null    datetime64[ns] 
 5   begasc               426 non-null    datetime64[ns] 
 6   desctim              426 non-null    float64        
 7   botttim              144 non-null    float64        
 8   asctim               426 non-null    float64        
 9   divetim              426 non-null    float64        
 10  descdist             426 non-null    float64        
 11  bottdist             144 non-null    float64        
 12  ascdist              426 non-null    float64        
 13  bottdep_mean         144 non-null    float64        
 14  bottdep_median       144 non-null    float64        
 15  bottdep_sd           144 non-null    float64        
 16  maxdep               426 non-null    float64        
 17  desc_tdist           242 non-null    float64        
 18  desc_mean_speed      242 non-null    float64        
 19  desc_angle           206 non-null    float64        
 20  bott_tdist           144 non-null    float64        
 21  bott_mean_speed      144 non-null    float64        
 22  asc_tdist            205 non-null    float64        
 23  asc_mean_speed       205 non-null    float64        
 24  asc_angle            185 non-null    float64        
 25  descD_mean           426 non-null    float64        
 26  descD_std            426 non-null    float64        
 27  descD_min            426 non-null    float64        
 28  descD_25%            426 non-null    float64        
 29  descD_50%            426 non-null    float64        
 30  descD_75%            426 non-null    float64        
 31  descD_max            426 non-null    float64        
 32  bottD_mean           144 non-null    float64        
 33  bottD_std            144 non-null    float64        
 34  bottD_min            144 non-null    float64        
 35  bottD_25%            144 non-null    float64        
 36  bottD_50%            144 non-null    float64        
 37  bottD_75%            144 non-null    float64        
 38  bottD_max            144 non-null    float64        
 39  ascD_mean            426 non-null    float64        
 40  ascD_std             426 non-null    float64        
 41  ascD_min             426 non-null    float64        
 42  ascD_25%             426 non-null    float64        
 43  ascD_50%             426 non-null    float64        
 44  ascD_75%             426 non-null    float64        
 45  ascD_max             426 non-null    float64        
 46  postdive_dur         426 non-null    timedelta64[ns]
 47  postdive_tdist       426 non-null    float64        
 48  postdive_mean_speed  426 non-null    float64        
dtypes: datetime64[ns](5), float64(42), int64(1), timedelta64[ns](1)
memory usage: 166.4 KB

Extract postdive duration for further analysis.

postdives = stats_tab["postdive_dur"][stats_tab["phase_id"] == 4]
postdives_diff = postdives.dt.total_seconds().diff()[1:].abs()
# Remove isolated dives
postdives_diff = postdives_diff[postdives_diff < 2000]

Non-linear least squares via “broken-stick” model

skdiveMove provides the BoutsNLS class for fitting non-linear least squares (NLS) models to a modified histogram of a given variable.

The first step is to generate a modified histogram of postdive duration, and this requires choosing the bin width for the histogram.

postdives_nlsbouts = skbouts.BoutsNLS(postdives_diff, 0.1)
print(postdives_nlsbouts)
Class BoutsNLS object
histogram method:    standard
log-frequency histogram:
             x  lnfreq
count    50.00  50.000
mean    284.54  -3.684
std     376.22   2.632
min       0.05  -8.216
25%      42.55  -5.521
50%     122.55  -3.912
75%     328.80  -2.798
max    1449.95   3.258

Two-process model

Assuming a 2-process model, calculate starting values, providing a guess at 50 s interdive interval.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars2 = postdives_nlsbouts.init_pars([50], plot=True, ax=ax)
_images/boutsdemo_4_0.png

Fit the two-process model.

coefs2, pcov2 = postdives_nlsbouts.fit(init_pars2)
# Coefficients
print(coefs2)
        (0.049, 50.0]  (50.0, 1449.95]
a              41.686            8.108
lambda          0.115            0.003
# Covariance between parameters
print(pcov2)
[[+4.375e+02 +4.039e-01 +1.464e+00 +3.388e-04]
 [+4.039e-01 +1.110e-03 +1.139e-02 +2.793e-06]
 [+1.464e+00 +1.139e-02 +3.759e+00 +1.511e-04]
 [+3.388e-04 +2.793e-06 +1.511e-04 +3.574e-07]]

Calculate bout-ending criterion.

# `bec` returns ndarray, and we have only one here
print("bec = {[0]:.2f}".format(postdives_nlsbouts.bec(coefs2)))
bec = 47.50

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs2, ax=ax);
_images/boutsdemo_8_0.png

Three-process model

Attempt to discern three processes in the data.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars3 = postdives_nlsbouts.init_pars([50, 550], plot=True, ax=ax)
_images/boutsdemo_9_0.png

Fit three-process model.

coefs3, pcov3 = postdives_nlsbouts.fit(init_pars3)
# Coefficients
print(coefs3)
        (0.049, 50.0]  (50.0, 550.0]  (550.0, 1449.95]
a              43.033          5.836             3.732
lambda          0.136          0.011             0.001
# Covariance between parameters
print(pcov3)
[[+5.457e+02 +5.911e-01 +1.012e+00 -5.469e-05 -8.834e-02 -1.191e-04]
 [+5.911e-01 +2.317e-03 +2.922e-02 +1.217e-04 +9.120e-03 +1.012e-05]
 [+1.012e+00 +2.922e-02 +6.778e+00 +4.515e-04 -8.611e-01 -1.570e-03]
 [-5.469e-05 +1.217e-04 +4.515e-04 +6.806e-05 +6.797e-03 +8.297e-06]
 [-8.834e-02 +9.120e-03 -8.611e-01 +6.797e-03 +2.934e+00 +6.962e-04]
 [-1.191e-04 +1.012e-05 -1.570e-03 +8.297e-06 +6.962e-04 +2.382e-06]]

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_nlsbouts.plot_fit(coefs3, ax=ax);
_images/boutsdemo_12_0.png

Compare the cumulative frequency distributions of two- vs three-process models.

fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
postdives_nlsbouts.plot_ecdf(coefs3, ax=axs[1]);
_images/boutsdemo_13_0.png

The three-process model does not seem appropriate.

Maximum likelihood estimation

Another way to model Poisson mixtures that does not rely on the subjectively created histogram, and involves fewer parameters, requires fitting via maximum likelihood estimation (MLM). This approach is available in BoutsMLE.

Set up an instance.

postdives_mlebouts = skbouts.BoutsMLE(postdives_diff, 0.1)
print(postdives_mlebouts)
Class BoutsMLE object
histogram method:    standard
log-frequency histogram:
             x  lnfreq
count    50.00  50.000
mean    284.54  -3.684
std     376.22   2.632
min       0.05  -8.216
25%      42.55  -5.521
50%     122.55  -3.912
75%     328.80  -2.798
max    1449.95   3.258

Again, assuming a 2-process model, calculate starting values.

fig, ax = plt.subplots(figsize=_FIG1X1)
init_pars = postdives_mlebouts.init_pars([50], plot=True, ax=ax)
_images/boutsdemo_15_0.png

Fit the two-process model. It is important, but optional, to supply reasonable bounds to help the optimization algorithm. Otherwise, the algorithm may fail to converge. The fitting procedure is done in two steps: with and without a reparameterized log-likelihood function. Therefore, there are two sets of bounds required.

p_bnd = (-2, None)                 # bounds for `p`
lda1_bnd = (-5, None)              # bounds for `lambda1`
lda2_bnd = (-10, None)             # bounds for `lambda2`
bnd1 = (p_bnd, lda1_bnd, lda2_bnd)
p_bnd = (1e-8, None)
lda1_bnd = (1e-8, None)
lda2_bnd = (1e-8, None)
bnd2 = (p_bnd, lda1_bnd, lda2_bnd)
fit1, fit2 = postdives_mlebouts.fit(init_pars,
                                    fit1_opts=dict(method="L-BFGS-B",
                                                   bounds=bnd1),
                                    fit2_opts=dict(method="L-BFGS-B",
                                                   bounds=bnd2))
# First fit
print(fit1)
      fun: 917.8524699075529
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
      jac: array([+0., -0., +0.])
  message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 40
      nit: 8
     njev: 10
   status: 0
  success: True
        x: array([+0.826, -2.69 , -5.629])
# Second fit
print(fit2)
      fun: 917.8524699073647
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
      jac: array([+0.001, +0.001, +0.003])
  message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 16
      nit: 1
     njev: 4
   status: 0
  success: True
        x: array([+0.696, +0.068, +0.004])

Calculate bout-ending criterion (BEC).

print("bec = {:.2f}".format(postdives_mlebouts.bec(fit2)))
bec = 58.55

Plot the fit.

fig, ax = plt.subplots(figsize=_FIG1X1)
postdives_mlebouts.plot_fit(fit2, ax=ax);
_images/boutsdemo_20_0.png

Compare the cumulative frequency distribution between NLS and MLM model estimates.

fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
axs[0].set_title("NLS")
postdives_mlebouts.plot_ecdf(fit2, ax=axs[1])
axs[1].set_title("MLM");
_images/boutsdemo_21_0.png

Label bouts based on BEC from the last MLM model. Note that Timedelta type needs to be converted to total seconds to allow comparison with BEC.

bec = postdives_mlebouts.bec(fit2)
skbouts.label_bouts(postdives.dt.total_seconds(), bec, as_diff=True)
236     1
237     2
238     3
239     4
240     4
       ..
422    51
423    51
424    51
425    52
426    52
Name: postdive_dur, Length: 191, dtype: int64

Feel free to download a copy of this demo (boutsdemo.py).