Objectives
JAGS
together with R to do Bayesian
inference for parameters and states of potentially hierarchical
models.JAGS
.Prerequisites
JAGS
installed from http://mcmc-jags.sourceforge.net/R
and the R
library jagsUI
to interface Jags (for Julia there is Jags.jl
)BomVarData.csv
Hierarchical statistical models are sets of coupled probability models that are conditionally related to each other. As such, hierarchical models provide a broad framework that encompasses different kinds of models, from generalized linear mixed models (often called multilevel models) to Bayesian belief networks to customized process-based models in ecology, physiology, hydrology, etc.
Hierarchical modeling provides a general and flexible framework for
building complex statistical models from simple building blocks.
JAGS
is a software that makes it easy to program and fit
complex hierarchical models, because users don’t have to explicitly
state the joint probability distribution of the entire model. Instead,
the model is represented by submodels and their conditional
dependencies. This allows for the creative development of complex models
for which no standard packages are available.
JAGS
uses Gibbs sampling to do Bayesian inference for
parameters and states of (potentially) hierarchical models. Other
softwares and packages can be used for probabilistic programming, such
as Stan
(interfaced to R
with package
rstan
) or the python library PyMC3
(Krapu
& Borsuk 2019). An advantage of JAGS
over Stan is that
JAGS
can accommodate discrete variables, such as
Bernoulli-distributed presence-absences.
To demonstrate model development with JAGS
with use we
use field observations of toads. The dataset used here is a subset of
the long-term monitoring data set of observations of the yellow-bellied
toad (Bombina variegata) in the canton of Aargau in
Switzerland.
The core data is detection/non-detection of this species observed at \(N=379\) sites (ponds) \(i\) during up to 3 repeat visits \(j\), over multiple years \(t\) from 1999 to 2009 (\(T=11\)). Sites are nested in four different regions \(R\).
We first collapse these data across years and visits to observed presence-absence per site, and fit a very simple model to estimate the average expected probability of occurrence (occupancy). We conduct a naive analysis that ignores detection error.
# read data
dat <- read.csv(file = "../../data/BomVarData.csv")
# number of sites
nsites <- length(unique(dat$SiteID))
# extract the site data only:
sites <- unique(dat[, !(names(dat) %in% c("Visit", "Year", "Detection", "ObsID", "SiteVisited"))])
sites <- sites[(order(sites$SiteID)),]
# extract detection/nondetection data per site, pooling all visits and years:
# was the toad ever seen in site i?
seen_i <- aggregate(dat$Detection, by = list(SiteID=dat$SiteID), FUN=max, na.rm=TRUE)
# number of regions
nregions <- length(unique(sites$Region))
# Region index: in JAGS we need to use integer indices to encode
# categorical data; indices must start at 1 and be consecutive
Region <- as.integer(as.factor(sites$Region))
We describe the presence/absence of the toads by an generalized linear mixed model (GLMM), a relatively simple hierarchical model. It allows to account for differences between regions in the expected occurrence probability \(\theta\).
\[\begin{align*} y_{i} & \sim \text{Bernoulli}(\theta_{i}) \\ \theta_{i} & = \text{logit}^{-1}(\alpha_{R(i)}) \\ \alpha_{R(i)} & \sim \text{Normal}(\mu, \tau) \\ \mu & \sim \text{Normal}(0, 0.001)\\ \tau & \sim \text{Gamma}(0.5, 0.5) \end{align*}\]
where \(\text{logit}^{-1}(x) = \exp(x)/(1 + \exp(x))\).
Regional occurrence probabilities are generated by a common process with expectation \(\mu\) and precision \(\tau = 1 /\sigma^2\), called hyperparameters with their own prior distributions (hyperpriors).
JAGS
A JAGS
model must be defined in a separate text file; we
can writ this file from R or use any text editor. The JAGS
model description goes inside curly brackets, preceded with the word
model:model { }
. Use hashtags to comment code. The model is
specified as a series of relations between variables. Every variable
must be either defined in the model or given as data. Relations can be
deterministic (<-
) or stochastic (~
).
JAGS
automatically detects from the data provided which
variables are observed and which ones have to be inferred. The order in
which the model components are specified does not matter, it gets
compiled into a DAG before running. JAGS
then does Gibbs
sampling from the conditional distributions.
Steps:
JAGS
model to a file
JAGS
, levels of categorical variables are
specified as integers, that must be consecutive and start at 1JAGS
specifies normal distributions by the mean
and precision \(\tau =
1/\sigma^2\)R
JAGS
jags()
from package
jagsUI
We save the following JAGS code in a file called
glmm_v1.jags
. Note how closely the syntax follows the
mathematical equations.
model{
## Is site i occupied?
for(i in 1:nsites){
y[i] ~ dbern(theta[i])
logit(theta[i]) <- alpha_R[Region[i]]
}
## Priors:
for(r in 1:nregions){
alpha_R[r] ~ dnorm(mu,tau)
}
## Hyperpriors:
mu ~ dnorm(0, 0.001) # corresponds to a Normal distribution with variance = 1/0.001
tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
}
library(jagsUI)
### Run the JAGS model from R:---------------------------
### Bundle data
in.data <- list(y = seen_i$x,
nsites = nsites,
Region = Region,
nregions = nregions)
### Initial values
### -> can be omitted; if no inits are given, JAGS picks them automatically
### Parameters to monitor
params <- c("alpha_R", "mu", "tau") # "sd"
### Run the model
# MCMC settings
ni <- 3000 # iterations
nt <- 10 # thinning
nb <- 1000 # burn-in
nc <- 3 # chains
out.v1 <- jags(data = in.data,
parameters.to.save = params,
parallel = T,
model.file=paste0("./JAGS_models/glmm_v1.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=FALSE, store.data = F)
out.v1
# regional estimates:
plogis(out.v1$mean$alpha_R) # plogis(x) = exp(x)/(1 + exp(x))
# grand mean:
plogis(out.v1$mean$mu)
Extend the JAGS
model from above to include the effect
of pond size on occupancy.
Steps in extending a model:
JAGS
:
R
:
Jags model:
model{
## Ecological process model:
for (i in 1:nsites){
y[i] ~ dbern(theta[i])
logit(theta[i]) <- alpha_R[Region[i]] + beta*Area[i]
}
## Priors:
for (r in 1:nregions){
alpha_R[r] ~ dnorm(mu,tau)
}
beta ~ dnorm(0,0.001)
## Hyperpriors:
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
}
Run the JAGS model from R:
### Bundle data (we here use data from the first year only)
in.data <- list(y = seen_i$x,
Area = sites$log.Area.scaled,
nsites = nsites,
Region = Region,
nregions = nregions
)
### Initial values
### if no inits are given, JAGS picks them automatically
### Parameters to monitor
params <- c("alpha_R", "beta", "mu", "tau") # "sd"
### Run the model
# MCMC settings
ni <- 3000 # iterations
nt <- 10 # thinning
nb <- 1000 # burn-in
nc <- 3 # chains
out.v2 <- jags(data = in.data,
parameters.to.save = params,
parallel = T,
model.file=paste0("./JAGS_models/glmm_v2.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=FALSE, store.data = F)
out.v2
plot(out.v2) # --> no clear effect of pond size on occupancy
Occupancy models account for the observation process as separate from the true, latent ecological process. They do this by introducing a new random variable \(z\), which represents the true latent, but imperfectly observed occupancy state (MacKenzie et al. 2002). Detection/non-detection data from repeat visits to the same sites are exploited to estimate detection probability. At least some sites must have been visited repeatedly, during a period of closure (i.e., during a period within which the true state is assumed to remain constant).
Detection/non-detection data \(y_{ij}\) at site \(i\) during repeat surveys \(j\) are then described by a hierarchical model that contains one submodel for the only partially observed true state \(z\) (occupied or not occupied), and another submodel for the actual observations \(y\). The actual observations result from both the particular realization of the ecological process and of the observation process.
The observation process yields observations for each site \(i\) during visit \(j\), conditional on the true state \(z_{i}\) and detection probability \(p\)
\[ y_{ij} \sim \text{Bernoulli}(z_{i} p) \] Detection probability \(p\) is here assumed to be constant across sites and visits. We can easily formulate the model so as to let detection probability vary across sites and/ or visits (i.e., \(p_{ij}\)).
The ecological process yields the true, latent state \(z_{i}\) of site \(i\) (presence/ absence of a species) \[ z_{i} \sim \text{Bernoulli}(\psi) \] with occupancy probability \(\psi\) .
The model is completed with (flat) priors \(\psi \sim \text{Uniform}(0,1)\) and \(p \sim \text{Uniform}(0,1)\)
In JAGS
, the components of this hierarchical model are
encoded in the same way as the above algebraic statements, omitting the
need to explicitly formulate the joint likelihood.
We will now use the detection/non-detection data of this species during up to 3 repeat visits \(j\) in each site \(i\), over multiple years \(t\) from 1999 to 2009 (\(T=11\)), along with information on the sites and the observation process (the identity of the field worker/ observer).
The observation data is arranged in an array:
y_ijt
: Array of observations (detection/ non-detection)
in sites \(i\), repeat visit \(j\), year \(t\)Covariate data, ecological model:
sites
: Dataframe containing potentially informative
environmental variables describing the sites; we are especially
interested in the water surface area (or size) of the ponds and the
effect of water table fluctuations. These are features that can be
controlled when constructing ponds.Covariate data, observation model:
obs_ijt
: Array with identity of observer, dimensions
[site \(i\), visit \(j\), year \(t\)]. Observer identity is encoded as an
integer (1:134 over entire data set).Important when formatting data:
JAGS
loops over all dimensions of vectors, matrices,
etc. Good bookkeeping is essential! We must ensure that the data are all
in the right order, e.g. that the order of sites is the same in x
(sites$Fluctuations
) and in y. It is a good idea to always
label data, e.g. use dimnames for arrays.JAGS
as integer
indices; they must be consecutive and start at 1 (not at
zero).nsites <- length(unique(dat$SiteID))
nyears <- length(unique(dat$Year))
nvisits <- length(unique(dat$Visit))
# extract the site data only:
sites <- unique(dat[, !(names(dat) %in% c("Visit", "Year", "Detection", "ObsID", "SiteVisited"))])
sites <- sites[(order(sites$SiteID)),]
# extract the detection/nondetection data and store in array
# with dimensions [sites, visits, years]
y_ijt <- array(data = dat$Detection,
dim = c(nsites, nvisits, nyears),
dimnames=list(sites$SiteID, c(1:3), 1999:2009))
dim(y_ijt) # sites, visits, years
dimnames(y_ijt)[[3]]
# extract the observer identity and store in array with dimensions [sites, visits, years]
obs_ijt <- array(data = dat$ObsID,
dim = c(nsites, nvisits, nyears),
dimnames=list(sites$SiteID, c(1:3), 1999:2009))
nobservers <- length(unique(dat$ObsID))
We build the occupancy model for the first year, which will be needed to provide initial conditions for the dynamic model. We subset our data to the first year, and formulate the model for a single year only (note the indices!). We start with above model that accounts for regional differences in occupancy \(\psi_R\):
\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{R,t=1}) \] \[ y_{ij,t=1} \sim \text{Bernoulli}(z_{i,t=1} p) \]
While the indexing may seem tedious, it makes explicit over which dimensions the random variables or parameters vary. To begin with, we here assume detection probability to be constant across sites and visits.
We save the JAGS models as occ_v0.jags
:
model{
## Ecological process model
for(i in 1:nsites){
z[i] ~ dbern(psi1[i])
logit(psi1[i]) <- alpha.psi1[Region[i]]
}
## Observation model:
for(i in 1:nsites){
for (j in 1:nvisits){
y[i,j] ~ dbern(muy[i,j])
muy[i,j] <- z[i] * p
}
}
## Priors:
## Ecological model
for(r in 1:nregions){
alpha.psi1[r] ~ dnorm(mu, tau)
}
## Hyperpriors
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
## Observation model
p ~ dunif(0,1) # (constant) detection probability
}
We compile all needed data in R as list and then run the JAGS model
### Bundle data (we here use data from the first year only)
in.data <- list(y = y_ijt[,,1],
nsites = nsites,
nvisits = nvisits,
nregions= nregions,
Region = Region
)
### Initial values
### We use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0 # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max); rm(temp)
inits <- function() list(z = z.inits[,1],
p = runif(n = 1, 0, 1))
### Parameters to monitor
params <- c("p",
"alpha.psi1")
### Run the model
# MCMC settings
ni <- 5000 # iterations
nt <- 10 # thinning
nb <- 1000 # burn-in
nc <- 3 # chains
out.v0 <- jags(data = in.data,
inits = inits,
parameters.to.save = params,
parallel = TRUE,
model.file=paste0("./JAGS_models/occ_v0.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=FALSE, store.data = TRUE)
Extend the ecological process model to test whether the occurrence of this toad depends on the size (surface area) of the pond. Check whether there is a unimodal (quadratic) instead of linear response of occupancy probability to pond area.
\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{i,t=1}), \] \[ logit(\psi_{i,t=1}) = \alpha_{\psi1, R(i)} + \beta_{area} Area_{i}, \]
Additionally, try what happens when there are NAs in the explanatory variable!
### 1. UPDATE JAGS MODEL
### write JAGS model to text file from R
sink("./JAGS_models/occ_v1_1.jags")
cat("
model{
#------------------------------------------
# Ecological process model:
for (i in 1:nsites){
z[i] ~ dbern(psi1[i])
logit(psi1[i]) <- alpha.psi1[Region[i]] + beta.area.psi1 * Area[i] # + beta.area.psi1.2 * pow(Area[i],2)
}# end i
#------------------------------------------
# Observation model:
for (i in 1:nsites){
for (j in 1:nvisits){
y[i,j] ~ dbern(muy[i,j])
muy[i,j] <- z[i] * p
}# end j
}# end i
#------------------------------------------
# Priors:
# Ecological process model:
for(r in 1:nregions){
alpha.psi1[r] ~ dnorm(mu, tau)
}
beta.area.psi1 ~ dnorm(0, 0.001) # normal distribution: defined in JAGS by mean and precision tau = 1/variance
# beta.area.psi1.2 ~ dnorm(0, 0.001)
# Hyperpriors:
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
# Observation model:
p ~ dunif(0,1) # (constant) detection probability, on probability scale
}
", fill=TRUE)
sink()
### 2. UPDATE INDATA, INITS, MONTIORED PARAMS
### Bundle data (first year only)
in.data <- list(y = y_ijt[,,1],
nsites = nsites,
nvisits = nvisits,
# now also needed:
Area = sites$log.Area.scaled, # surface area of ponds (standardized)
Region = Region,
nregions = nregions
)
### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0 # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)
inits <- function() list(z = z.inits[,1],
alpha.psi1 = runif(n=nregions, -3, 3),
beta.area.ps1 = rnorm(n=1, 0, 1))
### Parameters to monitor
params <- c("alpha.psi1", "beta.area.psi1", # parameters for occupancy probability in year 1
#"beta.area.psi1.2", # quadratic term
"p")
### Run the model
# MCMC settings
ni <- 8000 # iterations
nt <- 10 # thinning
nb <- 3000 # burn-in
nc <- 3 # chains
starttime <- Sys.time()
out.v11 <- jags(data = in.data,
inits = inits,
parameters.to.save = params,
parallel = TRUE,
model.file=paste0("./JAGS_models/occ_v1_1.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=FALSE, store.data = TRUE)
print(Sys.time()-starttime)
### inspect the outcome:
out.v11
plot(out.v11) # traceplots and densities
Second, we want to improve the observation model, by accounting for variation in detection probability between observers. Observers are trained volunteers with different levels of experience. The identity of the observer may vary between sites, visits, and years: detection probability becomes \(p_{ijt}\) and we need to loop accordingly. We estimate detection probability with a random effect of observer identity, where observers vary around a common mean \(\mu_{obs}\):
\[
logit(p_{ij,t=1}) = \alpha_{obs} +
\beta_{obs(ij,t=1)}, \\ \beta_{obs} \sim
\text{Normal}(0,\sigma_{obs}^2).
\] NOTE: indices for categories or factor levels in
JAGS
are encoded as consecutive integers, starting
from 1.
### 1. UPDATE JAGS MODEL
sink("./JAGS_models/occ_v1_2.jags")
cat("
model{
#------------------------------------------
# Ecological process model:
for (i in 1:nsites){
z[i] ~ dbern(psi1[i])
logit(psi1[i]) <- alpha.psi1[Region[i]]
}# end i
#------------------------------------------
# Observation model:
for (i in 1:nsites){
for (j in 1:nvisits){
y[i,j] ~ dbern(muy[i,j])
muy[i,j] <- z[i] * p[i,j]
logit(p[i,j]) <- alpha.obs + beta.obs[obs[i,j]]
}# end j
}# end i
#------------------------------------------
# Priors:
## Ecological process model:
for(r in 1:nregions){
alpha.psi1[r] ~ dnorm(mu, tau)
}
# Hyperpriors:
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5) # gamma prior on precision (cf. Gelman 2006b)
## Observation model:
alpha.obs ~ dnorm(0, 0.001) # dunif(-10,10)
# random observer effects:
for (obs in 1:nobservers){
beta.obs[obs] ~ dnorm(0, tau.obs)
}
tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance = 1 / sd^2
sd.obs ~ dunif(0,10) # alternative : give prior on sd
}
", fill=TRUE)
sink()
### 2. UPDATE INDATA, INITS, MONTIORED PARAMS
### Trying to run this with the original array results in an error: "Index out of range"
### obs_ijt consists of integer indices for each of 1:134 observers
### in year 1, only 31 different observers were active:
length(unique(c(obs_ijt[,,1])))
### -> passing obs_ijt[,,1] for first year results in non-consecutive indices, and JAGS error "Index out of range".
### to use this subset, we need to replace the indices for observers active in year 1 to be consecutive:
lookup <- data.frame(current = sort(unique(c(obs_ijt[,,1]))), new = 1:length(unique(c(obs_ijt[,,1]))) ) # lookup table
year1.observers <- matrix(lookup$new[match(obs_ijt[,,1], lookup$current)], nrow = dim(obs_ijt)[1] ) # replace indices, new matrix for year 1 [sites, visits]
nobservers.year1 <- length(unique(c(year1.observers)))
### Bundle data (first year only)
in.data <- list(y = y_ijt[,,1],
nsites = nsites,
nvisits = nvisits,
Region = Region,
nregions = nregions,
obs = year1.observers, # observer identities in year 1 (re-labeled!)
nobservers = nobservers.year1 # number of unique observers
)
### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0 # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)
inits <- function() list(z = z.inits[,1],
alpha.psi1 = runif(n=nregions, -3, 3),
mu.obs = runif(n=1, -5, 5),
sd.obs = runif(n=1, 0, 5))
### Parameters to monitor
params <- c("alpha.psi1", # parameters for occupancy probability in year 1
"alpha.obs", "sd.obs", # hyperparameters for observer effect
"beta.obs") # p estimates for individual observers
### Run the model
# MCMC settings
ni <- 8000 # iterations
nt <- 10 # thinning
nb <- 3000 # burn-in
nc <- 3 # chains
starttime <- Sys.time()
out.v12 <- jags(data = in.data,
inits = inits,
parameters.to.save = params,
parallel = TRUE,
model.file=paste0("./JAGS_models/occ_v1_2.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=FALSE, store.data = TRUE)
print(Sys.time()-starttime)
### inspect the outcome:
out.v12
# plot(out.v12) # traceplots and densities
# average detection probability, on probability scale
plogis(out.v12$mean$alpha.obs)
plogis(out.v12$mean$alpha.obs + out.v12$mean$beta.obs) # individual observer det probs
The dynamic occupancy model extends the static, single-year model to the dynamic change in occupancy over time. It is based on metapopulation theory and describes how the true state (occupied or empty) of each site evolves over time as a function of two processes: colonization and persistence.
We initiate the dynamic model with a simple occupancy model for the occupancy probability in the first year. This provides the starting conditions \(z_{i, t=1}\) for the dynamic process.
\[ z_{i,t=1} \sim \text{Bernoulli}(\psi_{i,t=1}) \]
After the first year, the true state of each site depends only on its state in the previous year and the probabilities of colonization and persistence. Empty patches can get colonized in the next time step with probability \(\gamma\) or remain empty with probability \(1- \gamma\). Occupied patches can either persist (i.e., remain occupied) with probability \(\phi\) or go locally “extinct” with probability \(1- \phi\). The realization of this process over years \(t = 2, ...,T\) can be formulated for each site \(i\) and year \(t\) as conditional on the true state in the previous time step and the two process probabilities:
\[ z_{i, t} \sim \text{Bernoulli}( (1-z_{i, t-1}) \gamma_{it} + z_{i, t-1} \phi_{it} ) \]
This has the advantage that we can learn something about the dynamic process rates, not only the resulting occupancy probability. Finally, observations (detection/ non-detection data) are again related to the true, latent occupancy state via the observation model:
\[ y_{ijt} \sim \text{Bernoulli}(z_{it} p_{ijt}) \] Detection probability \(p_{ijt}\) is here allowed to vary between sites, visits, and years.
The aim of the dynamic occupancy model is to estimate the probabilities of colonization and persistence, and their dependence on site characteristics, using detection/ non-detection data and accounting for observation uncertainty.
Update the JAGS
model to use all data and include the
dynamic process. Start by assuming constant colonization and persistence
probabilities.
Covariate effects on colonization and persistence probabilities can easily be included using GLMs. Note that these explanatory models don’t have to be linear models, but can take any functional form.
We save the JAGS model in dynocc_v1.jags
.
model{
# Ecological process model:
for(i in 1:nsites){
# initial occupany in year 1
z[i,1] ~ dbern(psi1[i])
logit(psi1[i]) <- alpha.psi1[Region[i]]
# dynamic change year 2:T
for (k in 1:(nyear-1)){
z[i,(k+1)] ~ dbern(muZ[i,(k+1)])
muZ[i,(k+1)]<- ((1-z[i,k])*gamma[i,k]) + (z[i,k]*phi[i,k])
logit(gamma[i,k]) <- alpha.gamma
logit(phi[i,k]) <- alpha.phi
}
}
# Observation model:
for(i in 1:nsites){
for (j in 1:nvisits){
for(k in 1:nyear){
y[i,j,k] ~ dbern(muy[i,j,k])
muy[i,j,k] <- z[i,k] * p[i,j,k]
logit(p[i,j,k]) <- alpha.obs + beta.obs[obs[i,j,k]]
}
}
}
# Priors:
## Ecological process model:
# Occupancy year 1:
for(r in 1:nregions){
alpha.psi1[r] ~ dnorm(mu, tau)
}
# Hyperpriors:
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5)
# Colonization gamma, persistence phi:
alpha.gamma ~ dnorm(0, 0.001)
alpha.phi ~ dnorm(0, 0.001)
## Observation model:
alpha.obs ~ dnorm(0, 0.001)
# random observer effects:
for (obs in 1:nobservers){
beta.obs[obs] ~ dnorm(0, tau.obs)
}
tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance
sd.obs ~ dunif(0,10)
}
Again, we run this model from R:
Then we look at the results
Test for effects of covariates on colonization and persistence probabilities:
We can additionally let JAGS
calculate derived
variables, such as a summary of the actual number of sites occupied. We
will obtain full posteriors for such quantities as well.
### 1. EXTEND THE JAGS MODEL
sink("./JAGS_models/dynocc_v2.jags")
cat("
model{
#------------------------------------------
# Ecological process model:
for (i in 1:nsites){
# initial occupany in year 1
z[i,1] ~ dbern(psi1[i])
logit(psi1[i]) <- alpha.psi1[Region[i]]
# dynamic change year 2:T
for (k in 1:(nyear-1)){
z[i,(k+1)] ~ dbern(muZ[i,(k+1)])
muZ[i,(k+1)]<- ((1-z[i,k])*gamma[i,k]) + (z[i,k]*phi[i,k])
logit(gamma[i,k]) <- alpha.gamma + beta.area*Area[i] + beta.area.2* pow(Area[i],2) + beta.fluct*Fluct[i]
logit(phi[i,k]) <- alpha.phi + delta.area*Area[i] + delta.area.2* pow(Area[i],2) + delta.fluct*Fluct[i]
} #k
}# end i
#------------------------------------------
# Observation model:
for (i in 1:nsites){
for (j in 1:nvisits){
for(k in 1:nyear){
y[i,j,k] ~ dbern(muy[i,j,k])
muy[i,j,k] <- z[i,k] * p[i,j,k]
logit(p[i,j,k]) <- alpha.obs + beta.obs[obs[i,j,k]]
}#end k
}# end j
}# end i
#------------------------------------------
# Priors:
## Ecological process model:
# Occupancy year 1:
for(r in 1:nregions){
alpha.psi1[r] ~ dnorm(mu, tau)
}
# Hyperpriors:
mu ~ dnorm(0,0.001)
tau ~ dgamma(0.5, 0.5)
# Colonization gamma, persistence phi:
alpha.gamma ~ dnorm(0, 0.01)
beta.area ~ dnorm(0, 0.01) # dunif(-10,10)
beta.area.2 ~ dnorm(0, 0.01) # dunif(-10,10)
beta.fluct ~ dnorm(0, 0.01)
alpha.phi ~ dnorm(0, 0.01)
delta.area ~ dnorm(0, 0.01) # dunif(-10,10)
delta.area.2 ~ dnorm(0, 0.01) # dunif(-10,10)
delta.fluct ~ dnorm(0, 0.01) # dunif(-10,10)
## Observation model:
alpha.obs ~ dnorm(0, 0.01) # dunif(-10,10)
# random observer effects:
for (obs in 1:nobservers){
beta.obs[obs] ~ dnorm(0, tau.obs)
}
tau.obs <- 1/ pow(sd.obs,2) # tau = 1/variance
sd.obs ~ dunif(0,10)
#--------------------------------------------
# Derived parameters:
for (k in 1:nyear){
n.occ[k] <- sum(z[1:nsites,k]) # equivalent: sum(z[ ,k])
}#k
}
", fill=TRUE)
sink()
### 2. UDPATED INDATA, INITS, PARAMETERS
### Data prep:
# pass entire time series
### Bundle data (first year only)
in.data <- list(y = y_ijt,
nsites = dim(y_ijt)[1],
nvisits = dim(y_ijt)[2],
nregions = nregions,
Region = Region,
obs = obs_ijt, # observer identities in all years
nobservers = nobservers, # number of unique observers (all years)
nyear = dim(y_ijt)[3],
Area = sites$log.Area.scaled, # surface area of ponds (standardized)
Fluct= sites$Fluctuations
# New = sites$NewSite # new site? (0/1)
)
### Initial values
### use observed occurrences as inits for z
temp <- y_ijt; temp[is.na(temp)] <- 0 # remove NAs to avoid conflicts
z.inits <-apply(temp, c(1,3), max)
inits <- function() list(z = z.inits,
alpha.psi1 = runif(n=nregions, -3, 3),
alpha.obs = runif(n=1, 0.1, 1),
sd.obs = runif(n = 1, 0.5,3))
### Parameters to monitor
params <- c("alpha.psi1", # parameters for occupancy probability in year 1
"alpha.gamma",
"beta.area",
"beta.area.2",
"beta.fluct",
"alpha.phi",
"delta.area",
"delta.area.2",
"delta.fluct",
"alpha.obs", "sd.obs",# hyperparameters for observer effect
"n.occ")
#"beta.obs") # p estimates for individual observers
### Run the model
# MCMC settings (ca. 7 mins with 8000 time steps)
ni <- 8000 # iterations
nt <- 10 # thinning
nb <- 3000 # burn-in
nc <- 3 # chains
starttime <- Sys.time()
out.dyn2 <- jags(data = in.data,
inits = inits,
parameters.to.save = params,
parallel = TRUE,
model.file=paste0("./JAGS_models/dynocc_v2.jags"),
n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb,
verbose=F, store.data = TRUE)
print(Sys.time()-starttime)
# save(out.dyn2, file="./OUT/out_dyn2.RData")
Create partial dependence plots using joint posteriors.
JAGS
allows you to do inference for complex
hierarchical models.JAGS
model within model{ }
as a
series of stochastic (~
) and deterministic
(<-
) relations.JAGS
from R
:
JAGS
choose automatically from the prior)JAGS
output:
jagsUI
model object contains a lot of helpful
information; storing the used data and model aids reproducibilityout$sims.list
)
is the most flexibleGelman, A. (2006). Multilevel (hierarchical) modeling: What it can and cannot do. Technometrics, 48(3), 432–435. https://doi.org/10.1198/004017005000000661
Gelman, A. (2006b). Prior distributions for variance parameters in hierarchical models (Comment on Article by Browne and Draper). Bayesian Analysis, 1(3), 515–534. https://doi.org/10.1214/06-BA117A
Krapu, C., & Borsuk, M. (2019). Probabilistic programming: A review for environmental modellers. Environmental Modelling and Software, 114(January), 40–48. https://doi.org/10.1016/j.envsoft.2019.01.014
MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, A. A., & Langtimm, C. A. (2002). Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), 2248–2255. https://doi.org/10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2
Royle, J. A., & Kéry, M. (2007). A Bayesian state-space formulation of dynamic occupancy models. Ecology, 88(7), 1813–1823. https://doi.org/10.1890/06-0669.1