Objectives

  • Brief review of the concept of hierarchical statistical models.
  • Example how to use JAGS together with R to do Bayesian inference for parameters and states of potentially hierarchical models.
  • Build an ecological occupancy model and the dynamic occupancy model of Kéry & Royle (2007) in JAGS.

Prerequisites

Hierarchical statistical models are sets of coupled probability models that are conditionally related to each other. As such, hierarchical models provide a broad framework that encompasses different kinds of models, from generalized linear mixed models (often called multilevel models) to Bayesian belief networks to customized process-based models in ecology, physiology, hydrology, etc.

Hierarchical modeling provides a general and flexible framework for building complex statistical models from simple building blocks. JAGS is a software that makes it easy to program and fit complex hierarchical models, because users don’t have to explicitly state the joint probability distribution of the entire model. Instead, the model is represented by submodels and their conditional dependencies. This allows for the creative development of complex models for which no standard packages are available.

JAGS uses Gibbs sampling to do Bayesian inference for parameters and states of (potentially) hierarchical models. Other softwares and packages can be used for probabilistic programming, such as Stan (interfaced to Rwith package rstan) or the python library PyMC3 (Krapu & Borsuk 2019). An advantage of JAGS over Stan is that JAGS can accommodate discrete variables, such as Bernoulli-distributed presence-absences.

1. Toad colonization 🐸

To demonstrate model development with JAGS with use we use field observations of toads. The dataset used here is a subset of the long-term monitoring data set of observations of the yellow-bellied toad (Bombina variegata) in the canton of Aargau in Switzerland.

The core data is detection/non-detection of this species observed at \(N=379\) sites (ponds) \(i\) during up to 3 repeat visits \(j\), over multiple years \(t\) from 1999 to 2009 (\(T=11\)). Sites are nested in four different regions \(R\).

We first collapse these data across years and visits to observed presence-absence per site, and fit a very simple model to estimate the average expected probability of occurrence (occupancy). We conduct a naive analysis that ignores detection error.

# read data
dat <- read.csv(file = "../../data/BomVarData.csv")

# number of sites
nsites <- length(unique(dat$SiteID))

# extract the site data only:
sites <- unique(dat[, !(names(dat) %in% c("Visit", "Year", "Detection", "ObsID", "SiteVisited"))])
sites <- sites[(order(sites$SiteID)),]

# extract detection/nondetection data per site, pooling all visits and years:
# was the toad ever seen in site i?
seen_i <- aggregate(dat$Detection, by = list(SiteID=dat$SiteID), FUN=max, na.rm=TRUE)

# number of regions
nregions <- length(unique(sites$Region))

# Region index: in JAGS we need to use integer indices to encode
#  categorical data; indices must start at 1 and be consecutive
Region <- as.integer(as.factor(sites$Region))

2. Presence/absence model

We describe the presence/absence of the toads by an generalized linear mixed model (GLMM), a relatively simple hierarchical model. It allows to account for differences between regions in the expected occurrence probability \(\theta\).

\[\begin{align*} y_{i} & \sim \text{Bernoulli}(\theta_{i}) \\ \theta_{i} & = \text{logit}^{-1}(\alpha_{R(i)}) \\ \alpha_{R(i)} & \sim \text{Normal}(\mu, \tau) \\ \mu & \sim \text{Normal}(0, 0.001)\\ \tau & \sim \text{Gamma}(0.5, 0.5) \end{align*}\]

where \(\text{logit}^{-1}(x) = \exp(x)/(1 + \exp(x))\).

Regional occurrence probabilities are generated by a common process with expectation \(\mu\) and precision \(\tau = 1 /\sigma^2\), called hyperparameters with their own prior distributions (hyperpriors).

Implementation in JAGS

A JAGS model must be defined in a separate text file; we can writ this file from R or use any text editor. The JAGS model description goes inside curly brackets, preceded with the word model:model { }. Use hashtags to comment code. The model is specified as a series of relations between variables. Every variable must be either defined in the model or given as data. Relations can be deterministic (<-) or stochastic (~). JAGS automatically detects from the data provided which variables are observed and which ones have to be inferred. The order in which the model components are specified does not matter, it gets compiled into a DAG before running. JAGS then does Gibbs sampling from the conditional distributions.

Steps:

  1. write the JAGS model to a file
    • Note: in JAGS, levels of categorical variables are specified as integers, that must be consecutive and start at 1
    • Note: JAGS specifies normal distributions by the mean and precision \(\tau = 1/\sigma^2\)
  2. run the model from R
    • bundle input data in a list JAGS
    • define initial values (optional)
    • list names of parameters to be monitored
    • run the model using function jags() from package jagsUI

We save the following JAGS code in a file called glmm_v1.jags. Note how closely the syntax follows the mathematical equations.

model{

    ## Is site i occupied?
    for(i in 1:nsites){
        y[i] ~ dbern(theta[i])
        logit(theta[i]) <- alpha_R[Region[i]]
    }

    ## Priors:
    for(r in 1:nregions){
        alpha_R[r] ~ dnorm(mu,tau)
    }

    ## Hyperpriors:
    mu ~ dnorm(0, 0.001) # corresponds to a Normal distribution with variance = 1/0.001
    tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
}
library(jagsUI)

### Run the JAGS model from R:---------------------------
### Bundle data
in.data <- list(y = seen_i$x,
                nsites = nsites,
                Region = Region,
                nregions = nregions)

### Initial values
### -> can be omitted; if no inits are given, JAGS picks them automatically

### Parameters to monitor
params <- c("alpha_R", "mu", "tau") # "sd"

### Run the model
# MCMC settings
ni <- 3000  # iterations
nt <- 10    # thinning
nb <- 1000  # burn-in
nc <- 3     # chains


out.v1 <- jags(data = in.data,
                   parameters.to.save = params,
                   parallel = T,
                   model.file=paste0("./JAGS_models/glmm_v1.jags"),
                   n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
                   verbose=FALSE, store.data = F)

out.v1

# regional estimates:
plogis(out.v1$mean$alpha_R)  # plogis(x) = exp(x)/(1 + exp(x))
# grand mean:
plogis(out.v1$mean$mu)

Tasks

Extend the JAGS model from above to include the effect of pond size on occupancy.

Steps in extending a model:

  • JAGS:
    • write an updated version of the process or observation model
    • add priors for any new parameters
  • R:
    • update input data
    • update inits (optional)
    • update names of parameters that are monitored

Solution

Jags model:

model{

    ## Ecological process model:
    for (i in 1:nsites){
        y[i] ~ dbern(theta[i])
        logit(theta[i]) <- alpha_R[Region[i]] + beta*Area[i]
    }

    ## Priors:
    for (r in 1:nregions){
        alpha_R[r] ~ dnorm(mu,tau)
    }
    beta  ~ dnorm(0,0.001)
    ## Hyperpriors:
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)

}

Run the JAGS model from R:

### Bundle data (we here use data from the first year only)
in.data <- list(y = seen_i$x,
                Area = sites$log.Area.scaled,
                nsites = nsites,
                Region = Region,
                nregions = nregions
                )

### Initial values
### if no inits are given, JAGS picks them automatically

### Parameters to monitor
params <- c("alpha_R", "beta", "mu", "tau") # "sd"

### Run the model
# MCMC settings
ni <- 3000  # iterations
nt <- 10    # thinning
nb <- 1000  # burn-in
nc <- 3     # chains


out.v2 <- jags(data = in.data,
                   parameters.to.save = params,
                   parallel = T,
                   model.file=paste0("./JAGS_models/glmm_v2.jags"),
                   n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
                   verbose=FALSE, store.data = F)


out.v2
plot(out.v2) # --> no clear effect of pond size on occupancy

3. Static occupancy model

Occupancy models account for the observation process as separate from the true, latent ecological process. They do this by introducing a new random variable \(z\), which represents the true latent, but imperfectly observed occupancy state (MacKenzie et al. 2002). Detection/non-detection data from repeat visits to the same sites are exploited to estimate detection probability. At least some sites must have been visited repeatedly, during a period of closure (i.e., during a period within which the true state is assumed to remain constant).

Detection/non-detection data \(y_{ij}\) at site \(i\) during repeat surveys \(j\) are then described by a hierarchical model that contains one submodel for the only partially observed true state \(z\) (occupied or not occupied), and another submodel for the actual observations \(y\). The actual observations result from both the particular realization of the ecological process and of the observation process.

The observation process yields observations for each site \(i\) during visit \(j\), conditional on the true state \(z_{i}\) and detection probability \(p\)

\[ y_{ij} \sim \text{Bernoulli}(z_{i} p) \] Detection probability \(p\) is here assumed to be constant across sites and visits. We can easily formulate the model so as to let detection probability vary across sites and/ or visits (i.e., \(p_{ij}\)).

The ecological process yields the true, latent state \(z_{i}\) of site \(i\) (presence/ absence of a species) \[ z_{i} \sim \text{Bernoulli}(\psi) \] with occupancy probability \(\psi\) .

The model is completed with (flat) priors \(\psi \sim \text{Uniform}(0,1)\) and \(p \sim \text{Uniform}(0,1)\)

In JAGS, the components of this hierarchical model are encoded in the same way as the above algebraic statements, omitting the need to explicitly formulate the joint likelihood.

Data

We will now use the detection/non-detection data of this species during up to 3 repeat visits \(j\) in each site \(i\), over multiple years \(t\) from 1999 to 2009 (\(T=11\)), along with information on the sites and the observation process (the identity of the field worker/ observer).

The observation data is arranged in an array:

  • y_ijt: Array of observations (detection/ non-detection) in sites \(i\), repeat visit \(j\), year \(t\)

Covariate data, ecological model:

  • sites: Dataframe containing potentially informative environmental variables describing the sites; we are especially interested in the water surface area (or size) of the ponds and the effect of water table fluctuations. These are features that can be controlled when constructing ponds.

Covariate data, observation model:

  • obs_ijt: Array with identity of observer, dimensions [site \(i\), visit \(j\), year \(t\)]. Observer identity is encoded as an integer (1:134 over entire data set).

Important when formatting data:

  • JAGS loops over all dimensions of vectors, matrices, etc. Good bookkeeping is essential! We must ensure that the data are all in the right order, e.g. that the order of sites is the same in x (sites$Fluctuations) and in y. It is a good idea to always label data, e.g. use dimnames for arrays.
  • Categorical variables are encoded in JAGS as integer indices; they must be consecutive and start at 1 (not at zero).
  • Where possible, standardize covariates (to mean zero and variance = 1), in order to improve convergence.
nsites <- length(unique(dat$SiteID))
nyears <- length(unique(dat$Year))
nvisits <- length(unique(dat$Visit))

# extract the site data only:
sites <- unique(dat[, !(names(dat) %in% c("Visit", "Year", "Detection", "ObsID", "SiteVisited"))])
sites <- sites[(order(sites$SiteID)),]

# extract the detection/nondetection data and store in array
# with dimensions [sites, visits, years]
y_ijt <- array(data = dat$Detection,
               dim = c(nsites, nvisits, nyears),
               dimnames=list(sites$SiteID, c(1:3), 1999:2009))

dim(y_ijt) # sites, visits, years
dimnames(y_ijt)[[3]]


# extract the observer identity and store in array with dimensions [sites, visits, years]
obs_ijt <-  array(data = dat$ObsID,
                  dim = c(nsites, nvisits, nyears),
                  dimnames=list(sites$SiteID, c(1:3), 1999:2009))
nobservers <- length(unique(dat$ObsID))

Occupany model for a single year

We build the occupancy model for the first year, which will be needed to provide initial conditions for the dynamic model. We subset our data to the first year, and formulate the model for a single year only (note the indices!). We start with above model that accounts for regional differences in occupancy \(\psi_R\):

\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{R,t=1}) \] \[ y_{ij,t=1} \sim \text{Bernoulli}(z_{i,t=1} p) \]

While the indexing may seem tedious, it makes explicit over which dimensions the random variables or parameters vary. To begin with, we here assume detection probability to be constant across sites and visits.

We save the JAGS models as occ_v0.jags:

model{
    ## Ecological process model

    for(i in 1:nsites){
          z[i] ~ dbern(psi1[i])
          logit(psi1[i]) <- alpha.psi1[Region[i]]
    }

    ## Observation model:
    for(i in 1:nsites){
          for (j in 1:nvisits){
            y[i,j] ~ dbern(muy[i,j])
            muy[i,j] <- z[i] * p
          }
      }

    ## Priors:

    ## Ecological model
    for(r in 1:nregions){
        alpha.psi1[r] ~ dnorm(mu, tau)
    }
    ## Hyperpriors
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)

    ## Observation model
    p ~ dunif(0,1)    # (constant) detection probability
}

We compile all needed data in R as list and then run the JAGS model

### Bundle data (we here use data from the first year only)
in.data <- list(y = y_ijt[,,1],
                nsites = nsites,
                nvisits = nvisits,
                nregions= nregions,
                Region = Region
                )

### Initial values
### We use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0  # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max); rm(temp)

inits <- function() list(z = z.inits[,1],
                         p = runif(n = 1, 0, 1))

### Parameters to monitor
params <- c("p",
            "alpha.psi1")

### Run the model
# MCMC settings
ni <- 5000  # iterations
nt <- 10    # thinning
nb <- 1000  # burn-in
nc <- 3     # chains


out.v0 <- jags(data = in.data,
               inits = inits,
               parameters.to.save = params,
               parallel = TRUE,
               model.file=paste0("./JAGS_models/occ_v0.jags"),
               n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
               verbose=FALSE, store.data = TRUE)

Tasks 1

Extend the ecological process model to test whether the occurrence of this toad depends on the size (surface area) of the pond. Check whether there is a unimodal (quadratic) instead of linear response of occupancy probability to pond area.

\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{i,t=1}), \] \[ logit(\psi_{i,t=1}) = \alpha_{\psi1, R(i)} + \beta_{area} Area_{i}, \]

Additionally, try what happens when there are NAs in the explanatory variable!

Solution

### 1. UPDATE JAGS MODEL
### write JAGS model to text file from R
sink("./JAGS_models/occ_v1_1.jags")
cat("
    model{
    #------------------------------------------
    # Ecological process model:
    for (i in 1:nsites){
        z[i] ~ dbern(psi1[i])

        logit(psi1[i]) <- alpha.psi1[Region[i]] + beta.area.psi1 * Area[i]   #  + beta.area.psi1.2 * pow(Area[i],2)

    }# end i

    #------------------------------------------
    # Observation model:
    for (i in 1:nsites){
        for (j in 1:nvisits){
            y[i,j] ~ dbern(muy[i,j])
            muy[i,j] <- z[i] * p
        }# end j
      }# end i

    #------------------------------------------
    # Priors:

    # Ecological process model:
    for(r in 1:nregions){
    alpha.psi1[r] ~ dnorm(mu, tau)
    }
    beta.area.psi1 ~ dnorm(0, 0.001)  # normal distribution: defined in JAGS by mean and precision tau = 1/variance
    # beta.area.psi1.2 ~ dnorm(0, 0.001)

    # Hyperpriors:
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)

    # Observation model:
    p ~ dunif(0,1)               # (constant) detection probability, on probability scale

    }

    ", fill=TRUE)
sink()


### 2. UPDATE INDATA, INITS, MONTIORED PARAMS

### Bundle data (first year only)
in.data <- list(y = y_ijt[,,1],
                nsites = nsites,
                nvisits = nvisits,
                # now also needed:
                Area = sites$log.Area.scaled, # surface area of ponds (standardized)
                Region = Region,
                nregions = nregions
                )

### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0  # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)

inits <- function() list(z = z.inits[,1],
                         alpha.psi1 = runif(n=nregions, -3, 3),
                         beta.area.ps1 = rnorm(n=1, 0, 1))

### Parameters to monitor
params <- c("alpha.psi1", "beta.area.psi1", # parameters for occupancy probability in year 1
           #"beta.area.psi1.2", # quadratic term
            "p")

### Run the model
# MCMC settings
ni <- 8000  # iterations
nt <- 10    # thinning
nb <- 3000   # burn-in
nc <- 3     # chains

starttime <- Sys.time()

out.v11 <- jags(data = in.data,
            inits = inits,
            parameters.to.save = params,
            parallel = TRUE,
            model.file=paste0("./JAGS_models/occ_v1_1.jags"),
            n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
            verbose=FALSE, store.data = TRUE)

print(Sys.time()-starttime)

### inspect the outcome:
out.v11

plot(out.v11) # traceplots and densities

Task 2

Second, we want to improve the observation model, by accounting for variation in detection probability between observers. Observers are trained volunteers with different levels of experience. The identity of the observer may vary between sites, visits, and years: detection probability becomes \(p_{ijt}\) and we need to loop accordingly. We estimate detection probability with a random effect of observer identity, where observers vary around a common mean \(\mu_{obs}\):

\[ logit(p_{ij,t=1}) = \alpha_{obs} + \beta_{obs(ij,t=1)}, \\ \beta_{obs} \sim \text{Normal}(0,\sigma_{obs}^2). \] NOTE: indices for categories or factor levels in JAGS are encoded as consecutive integers, starting from 1.

Solution

### 1. UPDATE JAGS MODEL
sink("./JAGS_models/occ_v1_2.jags")
cat("
    model{
    #------------------------------------------
    # Ecological process model:
    for (i in 1:nsites){
        z[i] ~ dbern(psi1[i])
        logit(psi1[i]) <- alpha.psi1[Region[i]]
    }# end i

    #------------------------------------------
    # Observation model:
    for (i in 1:nsites){
          for (j in 1:nvisits){
            y[i,j] ~ dbern(muy[i,j])
            muy[i,j] <- z[i] * p[i,j]

            logit(p[i,j]) <- alpha.obs + beta.obs[obs[i,j]]

          }# end j
      }# end i

    #------------------------------------------
    # Priors:

    ## Ecological process model:
    for(r in 1:nregions){
    alpha.psi1[r] ~ dnorm(mu, tau)
    }
    # Hyperpriors:
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)

    ## Observation model:
    alpha.obs  ~ dnorm(0, 0.001) # dunif(-10,10)
    # random observer effects:
    for (obs in 1:nobservers){
      beta.obs[obs] ~ dnorm(0, tau.obs)
    }
    tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance = 1 / sd^2
    sd.obs ~ dunif(0,10)  # alternative : give prior on sd

    }

    ", fill=TRUE)
sink()


### 2. UPDATE INDATA, INITS, MONTIORED PARAMS

### Trying to run this with the original array results in an error: "Index out of range"
### obs_ijt consists of integer indices for each of 1:134 observers
### in year 1, only 31 different observers were active:
length(unique(c(obs_ijt[,,1])))
### -> passing obs_ijt[,,1] for first year results in non-consecutive indices, and JAGS error "Index out of range".

### to use this subset, we need to replace the indices for observers active in year 1 to be consecutive:
lookup <- data.frame(current = sort(unique(c(obs_ijt[,,1]))), new = 1:length(unique(c(obs_ijt[,,1]))) )  # lookup table
year1.observers <- matrix(lookup$new[match(obs_ijt[,,1], lookup$current)], nrow = dim(obs_ijt)[1] ) # replace indices, new matrix for year 1 [sites, visits]
nobservers.year1 <- length(unique(c(year1.observers)))

### Bundle data (first year only)
in.data <- list(y = y_ijt[,,1],
                nsites = nsites,
                nvisits = nvisits,
                Region = Region,
                nregions = nregions,
                obs = year1.observers, # observer identities in year 1 (re-labeled!)
                nobservers = nobservers.year1 # number of unique observers
                )


### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0  # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)

inits <- function() list(z = z.inits[,1],
                         alpha.psi1 = runif(n=nregions, -3, 3),
                         mu.obs = runif(n=1, -5, 5),
                         sd.obs = runif(n=1, 0, 5))

### Parameters to monitor
params <- c("alpha.psi1",  # parameters for occupancy probability in year 1
            "alpha.obs", "sd.obs", # hyperparameters for observer effect
            "beta.obs") # p estimates for individual observers

### Run the model
# MCMC settings
ni <- 8000  # iterations
nt <- 10    # thinning
nb <- 3000   # burn-in
nc <- 3     # chains


starttime <- Sys.time()

out.v12 <- jags(data = in.data,
                   inits = inits,
                   parameters.to.save = params,
                   parallel = TRUE,
                   model.file=paste0("./JAGS_models/occ_v1_2.jags"),
                   n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
                   verbose=FALSE, store.data = TRUE)

print(Sys.time()-starttime)

### inspect the outcome:
out.v12

# plot(out.v12) # traceplots and densities

# average detection probability, on probability scale
plogis(out.v12$mean$alpha.obs)
plogis(out.v12$mean$alpha.obs + out.v12$mean$beta.obs) # individual observer det probs

4. Dynamic occupancy model

The dynamic occupancy model extends the static, single-year model to the dynamic change in occupancy over time. It is based on metapopulation theory and describes how the true state (occupied or empty) of each site evolves over time as a function of two processes: colonization and persistence.

We initiate the dynamic model with a simple occupancy model for the occupancy probability in the first year. This provides the starting conditions \(z_{i, t=1}\) for the dynamic process.

\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{i,t=1}) \]

After the first year, the true state of each site depends only on its state in the previous year and the probabilities of colonization and persistence. Empty patches can get colonized in the next time step with probability \(\gamma\) or remain empty with probability \(1- \gamma\). Occupied patches can either persist (i.e., remain occupied) with probability \(\phi\) or go locally “extinct” with probability \(1- \phi\). The realization of this process over years \(t = 2, ...,T\) can be formulated for each site \(i\) and year \(t\) as conditional on the true state in the previous time step and the two process probabilities:

\[ z_{i, t} \sim \text{Bernoulli}( (1-z_{i, t-1}) \gamma_{it} + z_{i, t-1} \phi_{it} ) \]

This has the advantage that we can learn something about the dynamic process rates, not only the resulting occupancy probability. Finally, observations (detection/ non-detection data) are again related to the true, latent occupancy state via the observation model:

\[ y_{ijt} \sim \text{Bernoulli}(z_{it} p_{ijt}) \] Detection probability \(p_{ijt}\) is here allowed to vary between sites, visits, and years.

The aim of the dynamic occupancy model is to estimate the probabilities of colonization and persistence, and their dependence on site characteristics, using detection/ non-detection data and accounting for observation uncertainty.

Update the JAGS model to use all data and include the dynamic process. Start by assuming constant colonization and persistence probabilities.

Covariate effects on colonization and persistence probabilities can easily be included using GLMs. Note that these explanatory models don’t have to be linear models, but can take any functional form.

We save the JAGS model in dynocc_v1.jags.

model{
    # Ecological process model:
    for(i in 1:nsites){
        # initial occupany in year 1
        z[i,1] ~ dbern(psi1[i])
        logit(psi1[i]) <- alpha.psi1[Region[i]]

        # dynamic change year 2:T
        for (k in 1:(nyear-1)){
            z[i,(k+1)] ~ dbern(muZ[i,(k+1)])
            muZ[i,(k+1)]<- ((1-z[i,k])*gamma[i,k]) + (z[i,k]*phi[i,k])

            logit(gamma[i,k]) <- alpha.gamma
            logit(phi[i,k]) <- alpha.phi
        }
    }

    # Observation model:
    for(i in 1:nsites){
          for (j in 1:nvisits){
            for(k in 1:nyear){
              y[i,j,k] ~ dbern(muy[i,j,k])
              muy[i,j,k] <- z[i,k] * p[i,j,k]
              logit(p[i,j,k]) <- alpha.obs + beta.obs[obs[i,j,k]]
            }
         }
      }

    # Priors:

    ## Ecological process model:
    # Occupancy year 1:
    for(r in 1:nregions){
       alpha.psi1[r] ~ dnorm(mu, tau)
    }
    # Hyperpriors:
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5)

    # Colonization gamma, persistence phi:
    alpha.gamma ~ dnorm(0, 0.001)
    alpha.phi ~ dnorm(0, 0.001)

    ## Observation model:
     alpha.obs ~ dnorm(0, 0.001)
    # random observer effects:
    for (obs in 1:nobservers){
      beta.obs[obs] ~ dnorm(0, tau.obs)
    }
    tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance
    sd.obs ~ dunif(0,10)
}

Again, we run this model from R:

Then we look at the results

Tasks

Extend the dynamic occupancy model

Test for effects of covariates on colonization and persistence probabilities:

  • Does colonization probability depend on
    • the size, i.e. area, of the pond? Linearly or unimodally?
    • whether the water table fluctuates?
    • (whether the pond is a newly constructed pond?)
  • Does persistence probability depend on
    • the size, i.e. area, of the pond? Linearly or unimodally?
    • whether the water table fluctuates?

We can additionally let JAGS calculate derived variables, such as a summary of the actual number of sites occupied. We will obtain full posteriors for such quantities as well.

  • Add a derived quantity: what is the estimated true number of occupied ponds in each year?

Solution

### 1. EXTEND THE JAGS MODEL

sink("./JAGS_models/dynocc_v2.jags")
cat("
    model{
    #------------------------------------------
    # Ecological process model:
    for (i in 1:nsites){
         # initial occupany in year 1
        z[i,1] ~ dbern(psi1[i])
        logit(psi1[i]) <- alpha.psi1[Region[i]]

        # dynamic change year 2:T
        for (k in 1:(nyear-1)){

            z[i,(k+1)] ~ dbern(muZ[i,(k+1)])
            muZ[i,(k+1)]<- ((1-z[i,k])*gamma[i,k]) + (z[i,k]*phi[i,k])

            logit(gamma[i,k]) <- alpha.gamma + beta.area*Area[i] + beta.area.2* pow(Area[i],2) + beta.fluct*Fluct[i]
              logit(phi[i,k]) <- alpha.phi + delta.area*Area[i] + delta.area.2* pow(Area[i],2)  + delta.fluct*Fluct[i]

        } #k
    }# end i

    #------------------------------------------
    # Observation model:
    for (i in 1:nsites){
          for (j in 1:nvisits){
            for(k in 1:nyear){

              y[i,j,k] ~ dbern(muy[i,j,k])
              muy[i,j,k] <- z[i,k] * p[i,j,k]

              logit(p[i,j,k]) <- alpha.obs + beta.obs[obs[i,j,k]]
            }#end k
          }# end j
      }# end i

    #------------------------------------------
    # Priors:

    ## Ecological process model:
    # Occupancy year 1:
    for(r in 1:nregions){
    alpha.psi1[r] ~ dnorm(mu, tau)
    }
    # Hyperpriors:
    mu ~ dnorm(0,0.001)
    tau ~ dgamma(0.5, 0.5)

    # Colonization gamma, persistence phi:
    alpha.gamma ~ dnorm(0, 0.01)
    beta.area ~ dnorm(0, 0.01) # dunif(-10,10)
    beta.area.2 ~ dnorm(0, 0.01) # dunif(-10,10)
    beta.fluct ~ dnorm(0, 0.01)

    alpha.phi ~ dnorm(0, 0.01)
    delta.area ~ dnorm(0, 0.01) # dunif(-10,10)
    delta.area.2 ~ dnorm(0, 0.01) # dunif(-10,10)
    delta.fluct ~ dnorm(0, 0.01) # dunif(-10,10)

    ## Observation model:
    alpha.obs ~ dnorm(0, 0.01) # dunif(-10,10)
    # random observer effects:
    for (obs in 1:nobservers){
      beta.obs[obs] ~ dnorm(0, tau.obs)
    }
    tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance
    sd.obs ~ dunif(0,10)

    #--------------------------------------------
    # Derived parameters:
    for (k in 1:nyear){
      n.occ[k] <-  sum(z[1:nsites,k]) # equivalent: sum(z[ ,k])
    }#k
    }
    ", fill=TRUE)
sink()


### 2. UDPATED INDATA, INITS, PARAMETERS
### Data prep:
# pass entire time series

### Bundle data (first year only)
in.data <- list(y = y_ijt,
                nsites = dim(y_ijt)[1],
                nvisits = dim(y_ijt)[2],
                nregions = nregions,
                Region = Region,
                obs = obs_ijt, # observer identities in all years
                nobservers = nobservers, # number of unique observers (all years)
                nyear = dim(y_ijt)[3],
                Area = sites$log.Area.scaled, # surface area of ponds (standardized)
                Fluct= sites$Fluctuations
               # New = sites$NewSite # new site? (0/1)
                )


### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0  # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)

inits <- function() list(z = z.inits,
                         alpha.psi1 = runif(n=nregions, -3, 3),
                         alpha.obs = runif(n=1, 0.1, 1),
                         sd.obs = runif(n = 1, 0.5,3))

### Parameters to monitor
params <- c("alpha.psi1",  # parameters for occupancy probability in year 1
            "alpha.gamma",
            "beta.area",
            "beta.area.2",
            "beta.fluct",
            "alpha.phi",
            "delta.area",
            "delta.area.2",
            "delta.fluct",
            "alpha.obs", "sd.obs",# hyperparameters for observer effect
            "n.occ")
            #"beta.obs") # p estimates for individual observers

### Run the model
# MCMC settings  (ca. 7 mins with 8000 time steps)
ni <- 8000  # iterations
nt <- 10    # thinning
nb <- 3000   # burn-in
nc <- 3     # chains

starttime <- Sys.time()

out.dyn2 <- jags(data = in.data,
                 inits = inits,
                 parameters.to.save = params,
                 parallel = TRUE,
                 model.file=paste0("./JAGS_models/dynocc_v2.jags"),
                 n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
                 verbose=F, store.data = TRUE)

print(Sys.time()-starttime)

# save(out.dyn2, file="./OUT/out_dyn2.RData")

Create partial dependence plots using joint posteriors.

5. Summary

  • JAGS allows you to do inference for complex hierarchical models.
  • Specify a JAGS model within model{ } as a series of stochastic (~) and deterministic (<-) relations.
  • Run JAGS from R:
    • edit and bundle required data (be careful with dimensions and order; consecutive integer indices for levels of categorical variables)
    • specify initial values to help convergence (else let JAGS choose automatically from the prior)
    • specify parameters to be monitored
    • set MCMC settings and run the model
  • Working with JAGS output:
    • the jagsUI model object contains a lot of helpful information; storing the used data and model aids reproducibility
    • working with the full joint posteriors (out$sims.list) is the most flexible

References

Gelman, A. (2006). Multilevel (hierarchical) modeling: What it can and cannot do. Technometrics, 48(3), 432–435. https://doi.org/10.1198/004017005000000661

Gelman, A. (2006b). Prior distributions for variance parameters in hierarchical models (Comment on Article by Browne and Draper). Bayesian Analysis, 1(3), 515–534. https://doi.org/10.1214/06-BA117A

Krapu, C., & Borsuk, M. (2019). Probabilistic programming: A review for environmental modellers. Environmental Modelling and Software, 114(January), 40–48. https://doi.org/10.1016/j.envsoft.2019.01.014

MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, A. A., & Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), 2248–2255. https://doi.org/10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2

Royle, J. A., & Kéry, M. (2007). A Bayesian state-space formulation of dynamic occupancy models. Ecology, 88(7), 1813–1823. https://doi.org/10.1890/06-0669.1