This function fits a single season occupancy model using count data.

occuCOP(data, 
        psiformula = ~1, lambdaformula = ~1, 
        psistarts, lambdastarts, starts,
        method = "BFGS", se = TRUE, 
        engine = c("C", "R"), na.rm = TRUE, 
        return.negloglik = NULL, L1 = FALSE, ...)

Arguments

data

An unmarkedFrameOccuCOP object created with the unmarkedFrameOccuCOP function.

psiformula

Formula describing the occupancy covariates.

lambdaformula

Formula describing the detection covariates.

psistarts

Vector of starting values for likelihood maximisation with optim for occupancy probability \(\psi\). These values must be logit-transformed (with qlogis) (see details). By default, optimisation will start at 0, corresponding to an occupancy probability of 0.5 (plogis(0) is 0.5).

lambdastarts

Vector of starting values for likelihood maximisation with optim for detection rate \(\lambda\). These values must be log-transformed (with log) (see details). By default, optimisation will start at 0, corresponding to detection rate of 1 (exp(0) is 1).

starts

Vector of starting values for likelihood maximisation with optim. If psistarts and lambdastarts are provided, starts = c(psistarts, lambdastarts).

method

Optimisation method used by optim.

se

Logical specifying whether to compute (se=TRUE) standard errors or not (se=FALSE).

engine

Code to use for optimisation. Either "C" for fast C++ code, or "R" for native R code.

na.rm

Logical specifying whether to fit the model (na.rm=TRUE) or not (na.rm=FALSE) if there are NAs in the unmarkedFrameOccuCOP object.

return.negloglik

A list of vectors of parameters (c(psiparams, lambdaparams)). If specified, the function will not maximise likelihood but return the negative log-likelihood for the those parameters in the nll column of a dataframe. See an example below.

L1

Logical specifying whether the length of observations (L) are purposefully set to 1 (L1=TRUE) or not (L1=FALSE).

...

Additional arguments to pass to optim, such as lower and upper bounds or a list of control parameters.

Details

See unmarkedFrameOccuCOP for a description of how to supply data to the data argument. See unmarkedFrame for a more general documentation of unmarkedFrame objects for the different models implemented in unmarked.

The COP occupancy model

occuCOP fits a single season occupancy model using count data, as described in Pautrel et al. (2023).

The occupancy sub-model is:

$$z_i \sim \text{Bernoulli}(\psi_i)$$

  • With \(z_i\) the occupany state of site \(i\). \(z_i=1\) if site \(i\) is occupied by the species, i.e. if the species is present in site \(i\). \(z_i=0\) if site \(i\) is not occupied.

  • With \(\psi_i\) the occupancy probability of site \(i\).

The observation sub-model is:

$$ N_{ij} | z_i = 1 \sim \text{Poisson}(\lambda_{ij} L_{ij}) \\ N_{ij} | z_i = 0 \sim 0 $$

  • With \(N_{ij}\) the count of detection events in site \(i\) during observation \(j\).

  • With \(\lambda_{ij}\) the detection rate in site \(i\) during observation \(j\) (for example, 1 detection per day.).

  • With \(L_{ij}\) the length of observation \(j\) in site \(i\) (for example, 7 days.).

What we call "observation" (\(j\)) here can be a sampling occasion, a transect, a discretised session. Consequently, the unit of \(\lambda_{ij}\) and \(L_{ij}\) can be either a time-unit (day, hour, ...) or a space-unit (kilometer, meter, ...).

The transformation of parameters \(\psi\) and \(\lambda\)

In order to perform unconstrained optimisation, parameters are transformed.

The occupancy probability (\(\psi\)) is transformed with the logit function (psi_transformed = qlogis(psi)). It can be back-transformed with the "inverse logit" function (psi = plogis(psi_transformed)).

The detection rate (\(\lambda\)) is transformed with the log function (lambda_transformed = log(lambda)). It can be back-transformed with the exponential function (lambda = exp(lambda_transformed)).

Value

unmarkedFitOccuCOP object describing the model fit. See the unmarkedFit classes.

References

Pautrel, L., Moulherat, S., Gimenez, O. & Etienne, M.-P. Submitted. Analysing biodiversity observation data collected in continuous time: Should we use discrete or continuous-time occupancy models? Preprint at doi:10.1101/2023.11.17.567350 .

Author

Léa Pautrel

Examples

set.seed(123)
options(max.print = 50)

# We simulate data in 100 sites with 3 observations of 7 days per site.
nSites <- 100
nObs <- 3

# For an occupancy covariate, we associate each site to a land-use category.
landuse <- sample(factor(c("Forest", "Grassland", "City"), ordered = TRUE), 
                  size = nSites, replace = TRUE)
simul_psi <- ifelse(landuse == "Forest", 0.8, 
                    ifelse(landuse == "Grassland", 0.4, 0.1))
z <- rbinom(n = nSites, size = 1, prob = simul_psi)

# For a detection covariate, we create a fake wind variable.
wind <- matrix(rexp(n = nSites * nObs), nrow = nSites, ncol = nObs)
simul_lambda <- wind / 5
L = matrix(7, nrow = nSites, ncol = nObs)

# We now simulate count detection data
y <- matrix(rpois(n = nSites * nObs, lambda = simul_lambda * L), 
            nrow = nSites, ncol = nObs) * z

# We create our unmarkedFrameOccuCOP object
umf <- unmarkedFrameOccuCOP(
  y = y,
  L = L,
  siteCovs = data.frame("landuse" = landuse),
  obsCovs = list("wind" = wind)
)
print(umf)
#> Data frame representation of unmarkedFrame object.
#>   y.1 y.2 y.3 L.1 L.2 L.3   landuse     wind.1    wind.2     wind.3
#> 1   0   0   0   7   7   7      City 3.46359489 0.3615751 1.27416223
#> 2   3   2   0   7   7   7      City 1.27402772 0.9609442 0.62460966
#> 3   0   0   0   7   7   7      City 1.08148515 0.6631372 0.09404684
#> 4   0   0   0   7   7   7 Grassland 0.29695916 1.5776374 0.38352803
#> 5   0   0   0   7   7   7      City 0.08416073 1.9224809 0.17492273
#>  [ reached 'max' / getOption("max.print") -- omitted 95 rows ]

# We fit our model without covariates
fitNull <- occuCOP(data = umf)
print(fitNull)
#> 
#> Call:
#> occuCOP(data = umf)
#> 
#> Occupancy probability:
#>  Estimate    SE     z P(>|z|)
#>    -0.377 0.207 -1.82  0.0683
#> 
#> Detection rate:
#>  Estimate     SE     z  P(>|z|)
#>     -1.64 0.0805 -20.4 4.11e-92
#> 
#> AIC: 320.7247 

# We fit our model with covariates
fitCov <- occuCOP(data = umf, psiformula = ~ landuse, lambdaformula = ~ wind)
print(fitCov)
#> 
#> Call:
#> occuCOP(data = umf, psiformula = ~landuse, lambdaformula = ~wind)
#> 
#> Occupancy probability:
#>             Estimate    SE     z  P(>|z|)
#> (Intercept)   -0.417 0.268 -1.56 1.19e-01
#> landuse.L      1.073 0.461  2.33 1.98e-02
#> landuse.Q     -1.986 0.461 -4.31 1.66e-05
#> 
#> Detection rate:
#>             Estimate    SE     z  P(>|z|)
#> (Intercept)   -2.520 0.178 -14.2 1.16e-45
#> wind           0.651 0.088   7.4 1.41e-13
#> 
#> AIC: 262.6408 

# We back-transform the parameter's estimates
## Back-transformed occupancy probability with no covariates
backTransform(fitNull, "psi")
#> Backtransformed linear combination(s) of Occupancy probability estimate(s)
#> 
#>  Estimate     SE LinComb (Intercept)
#>     0.407 0.0499  -0.377           1
#> 
#> Transformation: logistic 

## Back-transformed occupancy probability depending on habitat use
predict(fitCov,
        "psi",
        newdata = data.frame("landuse" = c("Forest", "Grassland", "City")),
        appendData = TRUE)
#>    Predicted         SE      lower     upper   landuse
#> 1 0.65848269 0.10390413 0.43806132 0.8266564    Forest
#> 2 0.08296593 0.04043091 0.03094064 0.2040496 Grassland
#> 3 0.39724666 0.06412076 0.28053391 0.5269522      City

## Back-transformed detection rate with no covariates
backTransform(fitNull, "lambda")
#> Backtransformed linear combination(s) of Detection rate estimate(s)
#> 
#>  Estimate     SE LinComb (Intercept)
#>     0.194 0.0156   -1.64           1
#> 
#> Transformation: exp 

## Back-transformed detection rate depending on wind
predict(fitCov,
        "lambda",
        appendData = TRUE)
#>   Predicted         SE      lower     upper y.1 y.2 y.3   landuse    wind.1
#> 1 0.7660928 0.12844196 0.55153136 1.0641247   0   0   0      City 3.4635949
#> 2 0.1017691 0.01527607 0.07583087 0.1365795   3   2   0      City 1.2740277
#> 3 0.1842991 0.01719451 0.15350016 0.2212776   0   0   0      City 1.0814852
#> 4 0.1842830 0.01719408 0.15348497 0.2212608   0   0   0 Grassland 0.2969592
#>      wind.2     wind.3
#> 1 0.3615751 1.27416223
#> 2 0.9609442 0.62460966
#> 3 0.6631372 0.09404684
#> 4 1.5776374 0.38352803
#>  [ reached 'max' / getOption("max.print") -- omitted 296 rows ]

## This is not easily readable. We can show the results in a clearer way, by:
##  - adding the site and observation
##  - printing only the wind covariate used to get the predicted lambda
cbind(
  data.frame(
    "site" = rep(1:nSites, each = nObs),
    "observation" = rep(1:nObs, times = nSites),
    "wind" = getData(fitCov)@obsCovs
  ),
  predict(fitCov, "lambda", appendData = FALSE)
)
#>   site observation      wind Predicted         SE      lower     upper
#> 1    1           1 3.4635949 0.7660928 0.12844196 0.55153136 1.0641247
#> 2    1           2 0.3615751 0.1017691 0.01527607 0.07583087 0.1365795
#> 3    1           3 1.2741622 0.1842991 0.01719451 0.15350016 0.2212776
#> 4    2           1 1.2740277 0.1842830 0.01719408 0.15348497 0.2212608
#> 5    2           2 0.9609442 0.1503157 0.01646401 0.12127535 0.1863099
#> 6    2           3 0.6246097 0.1207681 0.01584938 0.09337750 0.1561933
#> 7    3           1 1.0814852 0.1625813 0.01669911 0.13293567 0.1988380
#>  [ reached 'max' / getOption("max.print") -- omitted 293 rows ]

# We can choose the initial parameters when fitting our model.
# For psi, intituively, the initial value can be the proportion of sites 
#          in which we have observations.
(psi_init <- mean(rowSums(y) > 0))
#> [1] 0.4

# For lambda, the initial value can be the mean count of detection events 
#             in sites in which there was at least one observation.
(lambda_init <- mean(y[rowSums(y) > 0, ]))
#> [1] 1.383333

# We have to transform them.
occuCOP(
  data = umf,
  psiformula = ~ 1,
  lambdaformula = ~ 1,
  psistarts = qlogis(psi_init),
  lambdastarts = log(lambda_init)
)
#> 
#> Call:
#> occuCOP(data = umf, psiformula = ~1, lambdaformula = ~1, psistarts = qlogis(psi_init), 
#>     lambdastarts = log(lambda_init))
#> 
#> Occupancy probability:
#>  Estimate    SE     z P(>|z|)
#>    -0.377 0.207 -1.82  0.0683
#> 
#> Detection rate:
#>  Estimate     SE     z  P(>|z|)
#>     -1.64 0.0805 -20.4 4.11e-92
#> 
#> AIC: 320.7247 

# If we have covariates, we need to have the right length for the start vectors.
# psi ~ landuse --> 3 param to estimate: Intercept, landuseForest, landuseGrassland
# lambda ~ wind --> 2 param to estimate: Intercept, wind
occuCOP(
  data = umf,
  psiformula = ~ landuse,
  lambdaformula = ~ wind,
  psistarts = rep(qlogis(psi_init), 3),
  lambdastarts = rep(log(lambda_init), 2)
)
#> 
#> Call:
#> occuCOP(data = umf, psiformula = ~landuse, lambdaformula = ~wind, 
#>     psistarts = rep(qlogis(psi_init), 3), lambdastarts = rep(log(lambda_init), 
#>         2))
#> 
#> Occupancy probability:
#>             Estimate    SE     z  P(>|z|)
#> (Intercept)   -0.416 0.268 -1.55 1.20e-01
#> landuse.L      1.074 0.460  2.33 1.96e-02
#> landuse.Q     -1.984 0.461 -4.30 1.68e-05
#> 
#> Detection rate:
#>             Estimate    SE      z  P(>|z|)
#> (Intercept)   -2.520 0.178 -14.18 1.18e-45
#> wind           0.651 0.088   7.39 1.43e-13
#> 
#> AIC: 262.6408 

# And with covariates, we could have chosen better initial values, such as the
# proportion of sites in which we have observations per land-use category.
(psi_init_covs <- c(
  "City" = mean(rowSums(y[landuse == "City", ]) > 0),
  "Forest" = mean(rowSums(y[landuse == "Forest", ]) > 0),
  "Grassland" = mean(rowSums(y[landuse == "Grassland", ]) > 0)
))
#>      City    Forest Grassland 
#> 0.1142857 0.7272727 0.3750000 
occuCOP(
  data = umf,
  psiformula = ~ landuse,
  lambdaformula = ~ wind,
  psistarts = qlogis(psi_init_covs))
#> 
#> Call:
#> occuCOP(data = umf, psiformula = ~landuse, lambdaformula = ~wind, 
#>     psistarts = qlogis(psi_init_covs))
#> 
#> Occupancy probability:
#>             Estimate    SE     z  P(>|z|)
#> (Intercept)   -0.416 0.268 -1.55 1.20e-01
#> landuse.L      1.074 0.460  2.33 1.96e-02
#> landuse.Q     -1.984 0.461 -4.30 1.68e-05
#> 
#> Detection rate:
#>             Estimate    SE      z  P(>|z|)
#> (Intercept)   -2.520 0.178 -14.18 1.18e-45
#> wind           0.651 0.088   7.39 1.43e-13
#> 
#> AIC: 262.6408 

# We can fit our model with a different optimisation algorithm.
occuCOP(data = umf, method = "Nelder-Mead")
#> 
#> Call:
#> occuCOP(data = umf, method = "Nelder-Mead")
#> 
#> Occupancy probability:
#>  Estimate    SE     z P(>|z|)
#>    -0.377 0.207 -1.82  0.0684
#> 
#> Detection rate:
#>  Estimate     SE     z  P(>|z|)
#>     -1.64 0.0805 -20.4 4.12e-92
#> 
#> AIC: 320.7247 

# We can run our model with a C++ or with a R likelihood function.
## They give the same result. 
occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0)
#> 
#> Call:
#> occuCOP(data = umf, psistarts = 0, lambdastarts = 0, engine = "C")
#> 
#> Occupancy probability:
#>  Estimate    SE     z P(>|z|)
#>    -0.377 0.207 -1.82  0.0683
#> 
#> Detection rate:
#>  Estimate     SE     z  P(>|z|)
#>     -1.64 0.0805 -20.4 4.11e-92
#> 
#> AIC: 320.7247 
occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)
#> 
#> Call:
#> occuCOP(data = umf, psistarts = 0, lambdastarts = 0, engine = "R")
#> 
#> Occupancy probability:
#>  Estimate    SE     z P(>|z|)
#>    -0.377 0.207 -1.82  0.0683
#> 
#> Detection rate:
#>  Estimate     SE     z  P(>|z|)
#>     -1.64 0.0805 -20.4 4.11e-92
#> 
#> AIC: 320.7247 

## The C++ (the default) is faster.
system.time(occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0))
#>    user  system elapsed 
#>   0.006   0.004   0.008 
system.time(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0))
#>    user  system elapsed 
#>   0.033   0.000   0.032 

## However, if you want to understand how the likelihood is calculated,
## you can easily access the R likelihood function.
print(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)@nllFun)
#> function (params) 
#> {
#>     psi <- plogis(Xpsi %*% params[psiIdx])
#>     lambda <- exp(Xlambda %*% params[lambdaIdx])
#>     if (length(sitesRemoved) > 0) {
#>         siteAnalysed = (1:M)[-sitesRemoved]
#>     }
#>     else {
#>         siteAnalysed = (1:M)
#>     }
#>     iProb <- rep(NA, M)
#>     for (i in siteAnalysed) {
#>         iIdxall <- ((i - 1) * J + 1):(i * J)
#>         iIdx = iIdxall[!removed_obsvec[iIdxall]]
#>         if (SitesWithDetec[i]) {
#>             iProb[i] = psi[i] * ((sum(lambda[iIdx] * Lvec[iIdx]))^sum(yvec[iIdx])/factorial(sum(yvec[iIdx])) * 
#>                 exp(-sum(lambda[iIdx] * Lvec[iIdx])))
#>         }
#>         else {
#>             iProb[i] = psi[i] * exp(-sum(lambda[iIdx] * Lvec[iIdx])) + 
#>                 (1 - psi[i])
#>         }
#>     }
#>     ll = sum(log(iProb[siteAnalysed]))
#>     return(-ll)
#> }
#> <bytecode: 0x55c3b97185b0>
#> <environment: 0x55c3b96d97a0>

# Finally, if you do not want to fit your model but only get the likelihood,
# you can get the negative log-likelihood for a given set of parameters.
occuCOP(data = umf, return.negloglik = list(
  c("psi" = qlogis(0.25), "lambda" = log(2)),
  c("psi" = qlogis(0.5), "lambda" = log(1)),
  c("psi" = qlogis(0.75), "lambda" = log(0.5))
))
#>   logit(psi).(Intercept) log(lambda).(Intercept)       nll
#> 1              -1.098612               0.6931472 1294.2149
#> 2               0.000000               0.0000000  565.8794
#> 3               1.098612              -0.6931472  286.3071