Objectives:
Toy Models:
Some exercises will use the toy models “Monod”, “Growth”, or “Survival”, which were introduced in the previous lectures. You can see their definitions for instance in the slides “Mathematical Representation and Construction of Models”.
Construct the likelihood function for a simple one-dimensional linear model and additive, independent identically distributed (i.i.d.) normal errors. That is, the deterministic part of the model is given by \[ y_{det} (x,\beta,\gamma) = \beta x + \gamma \,. \]
For every observation we have the probabilistic model \[ Y_i = y_{det} (x_i,\beta,\gamma) + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\), so assume a independent Gaussian noise term.
Hint: Have a close look at the slides “Formulation of model likelihood functions” if you need inspiration.
For the linear model, implement a function that returns the logarithm
of the likelihood, for given model parameters and measurement data. Read
the data provided in the file model_linear.csv
and use them
as “observation data” to compute the log-likelihood for two parameter
sets: \(\{\beta = 2,\, \gamma = 1, \, \sigma =
1\}\) and \(\{ \beta = 3,\, \gamma = 2,
\, \sigma = 1 \}\). Which parameters are more likely to have
generated this data?
You can read the external data using the following piece of code:
## read the data:
dat <- read.table("data/model_linear.csv", header=TRUE)
dat$x
dat$y
## plot the data:
plot(dat$x, dat$y, xlab='x', ylab='y')
Here is a template for the log-likelihood implementation – you still need to fill in the important bits!
loglikelihood.linear <- function(par, x, y){
## deterministic part:
y.det <- ... # first evaluate the deterministic model output, using the parameters par = c(beta, gamma, sigma)
## Calculate loglikelihood:
# put your expression for log-likelihood here
# Hint: use function dnorm()
}
Read the data in to a DataFrame
using DataFrames
import CSV
dat = DataFrame(CSV.File("data/model_linear.csv", delim=' '))
dat.x
dat.y
Your loglikelihood function should look like this
using Distributions
function loglikelihood_linear(theta, x, y)
y_det = ... # first evaluate the deterministic model output,
# using the parameters par = c(beta, gamma, sigma)
## Calculate loglikelihood. use `logpdf(Normal(...))`
end
You can read the external data using the following piece of code:
# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"
# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")
Here is a template for the log-likelihood implementation – you still need to fill in the important bits!
def loglikelihood_linear(x, y, par):
"""
Calculate the log-likelihood for a linear model.
Parameters:
x (array-like): Independent variable.
y (array-like): Dependent variable.
par (list or array-like): Parameters of the model [slope, intercept, standard deviation].
Returns:
float: The log-likelihood value.
"""
# Deterministic part
y_det = # put your expression for y_det here
# Calculate log-likelihood
log_likelihood = # put your expression for log-likelihood here
return log_likelihood
Use an optimizer to find the parameter values that maximise the likelihood (the so called maximum likelihood estimator, MLE). Plot the resulting linear model together with the data. Does the result look reasonable?
You can use, for instance, the optim
function:
# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
par.max <- optim(par = c(beta_init, gamma_init, sigma_init), fn = beg.loglikelihood.linear, x=x, y=y)
# You can look at the result like this:
par.max$par
Note that optim
does minimization by default! You will
have to create a negative-loglikelihood function to minimize.
Use the function minimizer
from the package
Optim.jl
. You will have to create a negative-loglikelihood
function to minimize. This could look like this:
using Optim
param = Optim.minimizer(optimize(???, ???));
best_param = ???
You can use, for instance, the minimize()
function:
# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
result = minimize(function_to_minimize, par_init, args=(x, y))
# You can look at the result like this:
result
Note that minimize
does minimization! You will have to
create a negative-loglikelihood function to minimize.
Use the standard linear regression function to estimate the parameters, and compare them to the ones you found through likelihood maximisation in Exercise 1.3.
The function lm
implements linear regression.
## Build a linear regression model
linearModel <- lm(y ~ x, data=data)
## Inspect the model:
coef(linearModel)
summary(linearModel)
Use the function lm
from the package
GLM.jl
(Generalized Linear Model) and the macro
@formula
.
The function statsmodels.api.sm
implements linear
regression.
## Build a linear regression model
# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)
# Fit the linear model
model = sm.OLS(y, x_with_const).fit()
Construct the likelihood function for the model “Monod”. Use the deterministic part of the model given in the introduction to the exercises and assume i.i.d. normal errors on the deterministic model output. The result will look very similar to the solution of Exercise 1.1!
Implement a function that returns the logarithm of the likelihood,
for given model parameters and measurement data. Read the data provided
in the file model_monod_stoch.dat
and use them as
“observation data” to compute the log-likelihood for the parameter sets
\(\{r_{max} = 5,\, K = 3, \, \sigma =
0.2\}\) and \(\{r_{max} = 10,\, K = 4,
\, \sigma = 0.2\}\). Which parameters are more likely to have
generated this data?
Produce deterministic model outputs. Do this for the two example models:
Model “Monod” over a concentration (\(C\)) range from 0 to 10 with the default parameter values \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Model “Growth” over a time interval from 0 to 2 with default parameter values \(\mu = 4\), \(K = 10\), \(b = 1\), \(Y = 0.6\), \(C_{\mathrm{M,ini}} = 10\), and \(C_{\mathrm{S,ini}} = 50\).
Plot and interpret the results.
Both deterministic models are already implemented as
model.monod
and model.growth
in the file
models.R
.
Both deterministic models are already implemented as
model_monod
and model_growth
in the file
models.jl
.
Both deterministic models are already implemented as
model_monod
and model_growth
in the file
models.py
.
For the model “Monod”, write a function that produces model outputs (i.e. samples from the probabilistic model). In other words, we want to simulate new observation data including observation noise.
Make the assumptions that the noise is i.i.d normal with standard deviation \(\sigma = 0.2\). For the deterministic model use \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Use your function to simulate several thousand probabilistic model realisations (hypothetical data sets), for fixed model parameters, and plot the 10% and 90% quantiles as continuous prediction bands.
For the computation of quantiles you can use the R function
quantile
.
For the computation of quantiles, you can use the function
quantile
.
For the computation of quantiles, you can use the function
np.quantile
.
Construct the likelihood for the model “Survival”. The mortality rate \(\lambda\), upon multiplication with an infinitesimal time interval \(\Delta\), denotes the probability to die within this time interval, given that one is still alive at the beginning of it. If \(S(t)\) denotes the probability of an individual to be still alive at time point \(t\), this reads as \[ \text{Prob}\left(\text{die in }[t, t+\Delta \right) \mid \text{survived until } t) = \frac{S(t)-S(t+\Delta)}{S(t)} = \lambda \Delta \,. \] After some rearrangement, we let \(\Delta \rightarrow 0\) and obtain the differential equation \[ \dot S(t)=-\lambda S(t)\,. \] Solve this equation to find the time-dependence of \(S(t)\) (Hint: try an exponential ansatz or simply wolframalpha.com). From this solution, derive the probability for an individual to die within time-interval \([t_{i-1},t_i]\). Now, consider \(N\) independent individuals, each with the same mortality rate \(\lambda\). Derive the likelihood function, for a vector of death counts \({\bf y}=(y_1,\dots,y_n)\), where \(y_i\) denotes the number of deaths occurring within time interval \([t_{i-1},t_i]\), for \(0=t_0<t_1<\dots<t_n\).
Hint: Look up the definition of the multinomial distribution!
Write a function that simulates output from this probabilistic model, for given parameter values. Use this function to simulate several thousand model realisations, for fixed model parameters (use \(N=30\) individuals with mortality rate \(\lambda = 0.2d^{-1}\) and \(5\) subsequent observation windows of one day each). Use a boxplot to visualize the results.
Implement a function that returns the logarithm of the likelihood, for given parameter values and measurement data. Check that your log-likelihood is implemented correctly by generating model outputs and computing the likelihood for several parameters, including the one that you generated the data with.
For the “Monod” model try and compare different sensitivity analysis approaches.
Increase the model parameters by \(10 \%\) and by \(50 \%\), redo the forward simulations, and try to understand the effect of the parameter change on the model results.
A simple metric that measures the sensitivity of model parameters \(\theta\) to model output \(Y\) is defined as \[ s_{loc} = \frac{\Delta Y}{\Delta \theta} \] Often a relative metric is easier to interpret: \[ s_{loc} = \frac{\theta}{Y} \frac{\Delta Y}{\Delta \theta} \]
Set the ranges \(\Delta \theta\) to \(10 \%\) and by \(50 \%\) of the parameter values and compute the values for different concentrations (inputs). What is your interpretation?
Conduct a variance-based regional (global) sensitivity analyses with uniform parameter distributions.
Use the fast99
function implemented in the package sensitivity
.
The template below should get you started. For details see also the
documentation of fast99
.
library(sensitivity)
## 1) get sampling points in parameter space
lower = ...
upper = ...
res.fast <- fast99(factors = names(par.monod),
n = 100,
q = "qunif",
q.arg = list(list(min=lower[1], max=upper[1]),
list(min=lower[2], max=upper[2]))
)
## all parameter combinations to run the model with
head(res.fast$X)
## 2) run the model for all parameter combinations:
res.SA <- matrix(NA, nrow=nrow(res.fast$X), ncol=...)
for(i in 1:nrow(res.fast$X)){
res.SA[i,] <- model(unlist(res.fast$X[i,]), ...)
}
## 3) compute sensitivity index
S <- tell(res.fast, y = res.SA.monod[, j])
S
plot(S)
Use the package GlobalSensitivity.jl
and the method eFAST
. Try to visualize the sensitivities
with a plot.
using GlobalSensitivity
using Distributions
# 1) Define a vector of tuples that give the lower and
# upper bound for each parameter:
# Define a `Distribution` for each parameter
param_dists = [Uniform(...), # r_max = par[1]
Uniform(...)] # K = par[2]
# 2) Run SA. `gsa` can handle models with multiple outputs (i.e. concentrations)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
eFAST(num_harmonics=6),
param_dists,
samples = 200)
res.S1 # direct effect
res.ST # total effect
Use the SALib
package. Try to visualize the sensitivity
with a plot. here there is one example of how to set up a function for
the sensititvity analysis.
def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
"""
Perform local sensitivity analysis on the model function.
Parameters:
- model: Function representing the model.
- par_vector: Array of model parameters.
- par: Name of the parameter to perform sensitivity analysis on.
- delta_rel: Relative change in the parameter value.
- *args, **kwargs: Additional arguments for the model function.
Returns:
Dictionary containing absolute and relative sensitivity indices.
"""
# Copy the parameter array to adjust the parameter of interest
par_vector2 = par_vector.copy()
# Adjust the specified parameter by the relative delta
par_vector2[par] *= (1 + delta_rel)
# Run the model with the original and adjusted parameter array
Y = model(par_vector, *args, **kwargs)
Y2 = model(par_vector2, *args, **kwargs)
# Compute sensitivity indices
S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
S_rel = (par_vector[par] / Y) * S_abs
# Return sensitivity indices as a dictionary
return {'S_abs': S_abs, 'S_rel': S_rel}