Objectives:

Being able to derive and implement the likelihood function of a model.
Performing maximum likelihood estimations.
Understanding the difference between evaluating a likelihood function and sampling from a likelihood function.

Toy Models:

Some exercises will use the toy models “Monod”, “Growth”, or “Survival”, which were introduced in the previous lectures. You can see their definitions for instance in the slides “Mathematical Representation and Construction of Models”.

1. Likelihood for a linear model ★

Analytical expression

Construct the likelihood function for a simple one-dimensional linear model and additive, independent identically distributed (i.i.d.) normal errors. That is, the deterministic part of the model is given by \[ y_{det} (x,\beta,\gamma) = \beta x + \gamma \,. \]

For every observation we have the probabilistic model \[ Y_i = y_{det} (x_i,\beta,\gamma) + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\), so assume a independent Gaussian noise term.

Hint: Have a close look at the slides “Formulation of model likelihood functions” if you need inspiration.

Likelihood evaluation

For the linear model, implement a function that returns the logarithm of the likelihood, for given model parameters and measurement data. Read the data provided in the file model_linear.csv and use them as “observation data” to compute the log-likelihood for two parameter sets: \(\{\beta = 2,\, \gamma = 1, \, \sigma = 1\}\) and \(\{ \beta = 3,\, \gamma = 2, \, \sigma = 1 \}\). Which parameters are more likely to have generated this data?

Hints

R

You can read the external data using the following piece of code:

## read the data:
dat <- read.table("data/model_linear.csv", header=TRUE)
dat$x
dat$y

## plot the data:
plot(dat$x, dat$y, xlab='x', ylab='y')

Here is a template for the log-likelihood implementation – you still need to fill in the important bits!

loglikelihood.linear <- function(par, x, y){
    ## deterministic part:
    y.det <- ... # first evaluate the deterministic model output, using the parameters par = c(beta, gamma, sigma)

    ## Calculate loglikelihood:
                                        # put your expression for log-likelihood here
                                        # Hint: use function dnorm()

}

Julia

Read the data in to a DataFrame

using DataFrames
import CSV

dat = DataFrame(CSV.File("data/model_linear.csv", delim=' '))
dat.x
dat.y

Your loglikelihood function should look like this

using Distributions

function loglikelihood_linear(theta, x, y)
    y_det = ... # first evaluate the deterministic model output,
        # using the parameters par = c(beta, gamma, sigma)

        ## Calculate loglikelihood. use `logpdf(Normal(...))`

end

Python

You can read the external data using the following piece of code:

# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"

# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")

Here is a template for the log-likelihood implementation – you still need to fill in the important bits!

def loglikelihood_linear(x, y, par):
    """
    Calculate the log-likelihood for a linear model.

    Parameters:
    x (array-like): Independent variable.
    y (array-like): Dependent variable.
    par (list or array-like): Parameters of the model [slope, intercept, standard deviation].

    Returns:
    float: The log-likelihood value.
    """
    # Deterministic part
    y_det = # put your expression for y_det here

    # Calculate log-likelihood
    log_likelihood = # put your expression for log-likelihood here

    return log_likelihood

Likelihood optimization

Use an optimizer to find the parameter values that maximise the likelihood (the so called maximum likelihood estimator, MLE). Plot the resulting linear model together with the data. Does the result look reasonable?

Hints

R

You can use, for instance, the optim function:

# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
par.max <- optim(par = c(beta_init, gamma_init, sigma_init), fn = beg.loglikelihood.linear, x=x, y=y)

                                        # You can look at the result like this:
par.max$par

Note that optim does minimization by default! You will have to create a negative-loglikelihood function to minimize.

Julia

Use the function minimizer from the package Optim.jl. You will have to create a negative-loglikelihood function to minimize. This could look like this:

using Optim
param = Optim.minimizer(optimize(???, ???));
best_param = ???

Python

You can use, for instance, the minimize() function:

# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
result = minimize(function_to_minimize, par_init, args=(x, y))

# You can look at the result like this:
result

Note that minimize does minimization! You will have to create a negative-loglikelihood function to minimize.

Linear regression

Use the standard linear regression function to estimate the parameters, and compare them to the ones you found through likelihood maximisation in Exercise 1.3.

Hints

R

The function lm implements linear regression.

## Build a linear regression model
linearModel <- lm(y ~ x, data=data)

## Inspect the model:
coef(linearModel)
summary(linearModel)

Julia

Use the function lm from the package GLM.jl(Generalized Linear Model) and the macro @formula.

Python

The function statsmodels.api.sm implements linear regression.

## Build a linear regression model
# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)

# Fit the linear model
model = sm.OLS(y, x_with_const).fit()

2. Likelihood for model “Monod” ★

Analytical expression

Construct the likelihood function for the model “Monod”. Use the deterministic part of the model given in the introduction to the exercises and assume i.i.d. normal errors on the deterministic model output. The result will look very similar to the solution of Exercise 1.1!

Likelihood evaluation

Implement a function that returns the logarithm of the likelihood, for given model parameters and measurement data. Read the data provided in the file model_monod_stoch.dat and use them as “observation data” to compute the log-likelihood for the parameter sets \(\{r_{max} = 5,\, K = 3, \, \sigma = 0.2\}\) and \(\{r_{max} = 10,\, K = 4, \, \sigma = 0.2\}\). Which parameters are more likely to have generated this data?

3. Forward model simulation ★

Deterministic model simulation

Produce deterministic model outputs. Do this for the two example models:

Model “Monod” over a concentration (\(C\)) range from 0 to 10 with the default parameter values \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Model “Growth” over a time interval from 0 to 2 with default parameter values \(\mu = 4\), \(K = 10\), \(b = 1\), \(Y = 0.6\), \(C_{\mathrm{M,ini}} = 10\), and \(C_{\mathrm{S,ini}} = 50\).

Plot and interpret the results.

Hints

R

Both deterministic models are already implemented as model.monod and model.growth in the file models.R .

Julia

Both deterministic models are already implemented as model_monod and model_growth in the file models.jl .

Python

Both deterministic models are already implemented as model_monod and model_growth in the file models.py .

Probabilistic model simulation

For the model “Monod”, write a function that produces model outputs (i.e. samples from the probabilistic model). In other words, we want to simulate new observation data including observation noise.

For what could that be useful?

Make the assumptions that the noise is i.i.d normal with standard deviation \(\sigma = 0.2\). For the deterministic model use \(r_{\mathrm{max}} = 5\) and \(K = 3\).

Use your function to simulate several thousand probabilistic model realisations (hypothetical data sets), for fixed model parameters, and plot the 10% and 90% quantiles as continuous prediction bands.

Hints

R

For the computation of quantiles you can use the R function quantile.

Julia

For the computation of quantiles, you can use the function quantile.

Python

For the computation of quantiles, you can use the function np.quantile.

4. Likelihood and forward simulation for the model “Survival” ♛

Analytical expression for the likelihood

Construct the likelihood for the model “Survival”. The mortality rate \(\lambda\), upon multiplication with an infinitesimal time interval \(\Delta\), denotes the probability to die within this time interval, given that one is still alive at the beginning of it. If \(S(t)\) denotes the probability of an individual to be still alive at time point \(t\), this reads as \[ \text{Prob}\left(\text{die in }[t, t+\Delta \right) \mid \text{survived until } t) = \frac{S(t)-S(t+\Delta)}{S(t)} = \lambda \Delta \,. \] After some rearrangement, we let \(\Delta \rightarrow 0\) and obtain the differential equation \[ \dot S(t)=-\lambda S(t)\,. \] Solve this equation to find the time-dependence of \(S(t)\) (Hint: try an exponential ansatz or simply wolframalpha.com). From this solution, derive the probability for an individual to die within time-interval \([t_{i-1},t_i]\). Now, consider \(N\) independent individuals, each with the same mortality rate \(\lambda\). Derive the likelihood function, for a vector of death counts \({\bf y}=(y_1,\dots,y_n)\), where \(y_i\) denotes the number of deaths occurring within time interval \([t_{i-1},t_i]\), for \(0=t_0<t_1<\dots<t_n\).

Hint: Look up the definition of the multinomial distribution!

Forward simulation

Write a function that simulates output from this probabilistic model, for given parameter values. Use this function to simulate several thousand model realisations, for fixed model parameters (use \(N=30\) individuals with mortality rate \(\lambda = 0.2d^{-1}\) and \(5\) subsequent observation windows of one day each). Use a boxplot to visualize the results.

Likelihood evaluation

Implement a function that returns the logarithm of the likelihood, for given parameter values and measurement data. Check that your log-likelihood is implemented correctly by generating model outputs and computing the likelihood for several parameters, including the one that you generated the data with.

5. Sensitivity analysis

For the “Monod” model try and compare different sensitivity analysis approaches.

Manual sensitivity analysis

Increase the model parameters by \(10 \%\) and by \(50 \%\), redo the forward simulations, and try to understand the effect of the parameter change on the model results.

Which input concentrations do you use?

Local sensitivity analysis

A simple metric that measures the sensitivity of model parameters \(\theta\) to model output \(Y\) is defined as \[ s_{loc} = \frac{\Delta Y}{\Delta \theta} \] Often a relative metric is easier to interpret: \[ s_{loc} = \frac{\theta}{Y} \frac{\Delta Y}{\Delta \theta} \]

Set the ranges \(\Delta \theta\) to \(10 \%\) and by \(50 \%\) of the parameter values and compute the values for different concentrations (inputs). What is your interpretation?

Variance-based sensitivity

Conduct a variance-based regional (global) sensitivity analyses with uniform parameter distributions.

Hints

R

Use the fast99 function implemented in the package sensitivity. The template below should get you started. For details see also the documentation of fast99.

library(sensitivity)

## 1) get sampling points in parameter space
lower = ...
upper = ...
res.fast <- fast99(factors = names(par.monod),
                   n = 100,
                   q = "qunif",
                   q.arg = list(list(min=lower[1], max=upper[1]),
                                list(min=lower[2], max=upper[2]))
                   )
## all parameter combinations to run the model with
head(res.fast$X)

## 2) run the model for all parameter combinations:
res.SA <- matrix(NA, nrow=nrow(res.fast$X), ncol=...)
for(i in 1:nrow(res.fast$X)){
    res.SA[i,] <- model(unlist(res.fast$X[i,]), ...)
}

## 3) compute sensitivity index
S <- tell(res.fast, y = res.SA.monod[, j])
S
plot(S)

Julia

Use the package GlobalSensitivity.jl and the method eFAST. Try to visualize the sensitivities with a plot.

using GlobalSensitivity
using Distributions
# 1) Define a vector of tuples that give the lower and
#    upper bound for each parameter:

# Define a `Distribution` for each parameter
param_dists = [Uniform(...), # r_max = par[1]
               Uniform(...)] # K     = par[2]

# 2) Run SA. `gsa` can handle models with multiple outputs (i.e. concentrations)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
          eFAST(num_harmonics=6),
          param_dists,
          samples = 200)
res.S1                          # direct effect
res.ST                          # total effect

Python

Use the SALib package. Try to visualize the sensitivity with a plot. here there is one example of how to set up a function for the sensititvity analysis.

def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
    """
    Perform local sensitivity analysis on the model function.

    Parameters:
    - model: Function representing the model.
    - par_vector: Array of model parameters.
    - par: Name of the parameter to perform sensitivity analysis on.
    - delta_rel: Relative change in the parameter value.
    - *args, **kwargs: Additional arguments for the model function.

    Returns:
    Dictionary containing absolute and relative sensitivity indices.
    """
    # Copy the parameter array to adjust the parameter of interest
    par_vector2 = par_vector.copy()

    # Adjust the specified parameter by the relative delta
    par_vector2[par] *= (1 + delta_rel)

    # Run the model with the original and adjusted parameter array
    Y = model(par_vector, *args, **kwargs)
    Y2 = model(par_vector2, *args, **kwargs)

    # Compute sensitivity indices
    S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
    S_rel = (par_vector[par] / Y) * S_abs

    # Return sensitivity indices as a dictionary
    return {'S_abs': S_abs, 'S_rel': S_rel}

Exercise 1: Probabilistic models and likelihood functions

Eawag Summer School in Environmental Systems Analysis

1. Likelihood for a linear model ★

Analytical expression

Likelihood evaluation

Hints

R

Julia

Python

Likelihood optimization

Hints

R

Julia

Python

Linear regression

Hints

R

Julia

Python

2. Likelihood for model “Monod” ★

Analytical expression

Likelihood evaluation

3. Forward model simulation ★

Deterministic model simulation

Hints

R

Julia

Python

Probabilistic model simulation

Hints

R

Julia

Python

4. Likelihood and forward simulation for the model “Survival” ♛

Analytical expression for the likelihood

Forward simulation

Likelihood evaluation

5. Sensitivity analysis

Manual sensitivity analysis

Local sensitivity analysis

Variance-based sensitivity

Hints

R

Julia

Python