Objectives:

Being able to derive and implement the likelihood function of a model.
Performing maximum likelihood estimations.
Understanding the difference between evaluating a likelihood function and sampling from a likelihood function.

Toy Models:

Some exercises will use the toy models “Monod”, “Growth”, or “Survival”, which were introduced in the previous lectures. You can see their definitions for instance in the slides “Mathematical Representation and Construction of Models”.

1. Likelihood for a linear model ★

Analytical expression

Construct the likelihood function for a simple one-dimensional linear model and additive, independent identically distributed (i.i.d.) normal errors. That is, the deterministic part of the model is given by \[ y_{det} (x,\beta,\gamma) = \beta x + \gamma \,. \]

For every observation we have the probabilistic model \[ Y_i = y_{det} (x_i,\beta,\gamma) + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\), so assume a independent Gaussian noise term.

Hint: Have a close look at the slides “Formulation of model likelihood functions” if you need inspiration.

Likelihood evaluation

For the linear model, implement a function that returns the logarithm of the likelihood, for given model parameters and measurement data. Read the data provided in the file model_linear.csv and use them as “observation data” to compute the log-likelihood for two parameter sets: \(\{\beta = 2,\, \gamma = 1, \, \sigma = 1\}\) and \(\{ \beta = 3,\, \gamma = 2, \, \sigma = 1 \}\). Which parameters are more likely to have generated this data?

Hints

R

You can read the external data using the following piece of code:

## read the data:
dat <- read.table("data/model_linear.csv", header=TRUE)
dat$x
dat$y

## plot the data:
plot(dat$x, dat$y, xlab='x', ylab='y')

Here is a template for the log-likelihood implementation – you still need to fill in the important bits!

loglikelihood.linear <- function(par, x, y){
    ## deterministic part:
    y.det <- ... # first evaluate the deterministic model output, using the parameters par = c(beta, gamma, sigma)

    ## Calculate loglikelihood:
                                        # put your expression for log-likelihood here
                                        # Hint: use function dnorm()

}

Julia

Read the data in to a DataFrame

using DataFrames
import CSV

dat = DataFrame(CSV.File("data/model_linear.csv", delim=' '))
dat.x
dat.y

Your loglikelihood function should look like this

using Distributions

function loglikelihood_linear(theta, x, y)
    y_det = ... # first evaluate the deterministic model output,
        # using the parameters par = c(beta, gamma, sigma)

        ## Calculate loglikelihood. use `logpdf(Normal(...))`

end

Python

You can read the external data using the following piece of code:

# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"

# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")

Here is a template for the log-likelihood implementation – you still need to fill in the important bits!

def loglikelihood_linear(x, y, par):
    """
    Calculate the log-likelihood for a linear model.

    Parameters:
    x (array-like): Independent variable.
    y (array-like): Dependent variable.
    par (list or array-like): Parameters of the model [slope, intercept, standard deviation].

    Returns:
    float: The log-likelihood value.
    """
    # Deterministic part
    y_det = # put your expression for y_det here

    # Calculate log-likelihood
    log_likelihood = # put your expression for log-likelihood here

    return log_likelihood

Likelihood optimization

Use an optimizer to find the parameter values that maximise the likelihood (the so called maximum likelihood estimator, MLE). Plot the resulting linear model together with the data. Does the result look reasonable?

Hints

R

You can use, for instance, the optim function:

# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
par.max <- optim(par = c(beta_init, gamma_init, sigma_init), fn = beg.loglikelihood.linear, x=x, y=y)

                                        # You can look at the result like this:
par.max$par

Note that optim does minimization by default! You will have to create a negative-loglikelihood function to minimize.

Julia

Use the function minimizer from the package Optim.jl. You will have to create a negative-loglikelihood function to minimize. This could look like this:

using Optim
param = Optim.minimizer(optimize(???, ???));
best_param = ???

Python

You can use, for instance, the minimize() function:

# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
result = minimize(function_to_minimize, par_init, args=(x, y))

# You can look at the result like this:
result

Note that minimize does minimization! You will have to create a negative-loglikelihood function to minimize.

Linear regression

Use the standard linear regression function to estimate the parameters, and compare them to the ones you found through likelihood maximisation in Exercise 1.3.

Hints

R

The function lm implements linear regression.

## Build a linear regression model
linearModel <- lm(y ~ x, data=data)

## Inspect the model:
coef(linearModel)
summary(linearModel)

Julia

Use the function lm from the package GLM.jl(Generalized Linear Model) and the macro @formula.

Python

The function statsmodels.api.sm implements linear regression.

## Build a linear regression model
# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)

# Fit the linear model
model = sm.OLS(y, x_with_const).fit()

Solutions

Analytical expression

For a one-dimensional linear model with an additive error, a model output \(Y\) is given by the deterministic term \(y_{det}\) plus the noise term. For the noise term we choose here identical independent normal errors, so that for a single observation \(i\) (in a set of \(n\) observations) the full model is given by

\[ Y_i = y_{i,det} + Z_i , \quad \quad Z_i \sim \mathcal{N}(0,\sigma) , \quad \quad i \in \{1,\dots,n\} \;. \]

For the deterministic part, we have an input \(x\) and two model parameters \(\{\beta, \gamma\}\), from which \(y_{det}\) is given by

\[ y_{det}(x,\beta, \gamma) = \beta x + \gamma \;. \]

We therefore have three model parameters in total: \(\beta\) and \(\gamma\) from the deterministic term, and \(\sigma\) from the noise term. For convenience, we collect these into the set of parameters \({\boldsymbol{\theta}} = \{\beta,\gamma,\sigma\}\).

So for a single observation, the model is given by

\[ Y_i = \beta x_i + \gamma +Z_i \;, \]

which is equivalent to

\[ Y_i \sim \mathcal{N}\left(y_{i,det}(x_i,\beta,\gamma),\sigma\right) \;. \]

That is, the model output \(Y_i\) is normally distributed, where the mean is the deterministic model output \(y_{i,det}\) and the standard deviation \(\sigma\) is the standard deviation of the error term \(Z_i\). Note that adding a constant \(\mu\) to a standard normally distributed random variable \(Z\) results in a new normally distributed random variable, the mean of which is simply shifted: \(Y = \mu + Z\), \(Z \sim \mathcal{N}(0,\sigma)\) \(\implies\) \(Y \sim \mathcal{N}(\mu,\sigma)\).

So, the likelihood for a single observation \(y_i\), given input \(x_i\) and parameters \(\boldsymbol{\theta}\), is given by

\[ p(y_i|x_i,\boldsymbol{\theta}) = \frac{1}{\sqrt{2 \pi \sigma^2 }} \exp\left( {-\frac{(y_i - y_{i,det})^2}{2\sigma^2}} \right) \;. \]

Thanks to the assumption of independence, the likelihood of all observations \(\mathbf{y} = \{y_1,y_2,...,y_n\}\), given all inputs \(\mathbf{x} = \{x_1,x_2,...,x_n\}\), is then given by

\[ p(\mathbf{y}|\mathbf{x},\boldsymbol{\theta}) = p(y_1|x_1,\boldsymbol{\theta}) \, p(y_2|x_2,\boldsymbol{\theta}) \, p(y_3|x_3,\boldsymbol{\theta}) \, \dots \, p(y_n|x_n,\boldsymbol{\theta}) = \prod_{i=1}^n p(y_i|x_i,\boldsymbol{\theta}) \;, \]

which is then given in full by

\[ p(\mathbf{y}|\mathbf{x},\boldsymbol{\theta}) = L(\boldsymbol{\theta}) = \prod_{i=1}^n p(y_i|x_i,\boldsymbol{\theta}) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2} } \exp\left( {-\frac{(y_i - y_{i,det})^2}{2\sigma^2}} \right) \;. \]

Likelihood evaluation

R

The likelihood can be implemented like this:

loglikelihood.linear <- function(x, y, par){
    ## deterministic part:
    y.det <- par[1] * x + par[2]

    ## Calculate log-likelihood:
    return( sum(dnorm(y, mean=y.det, sd=par[3], log=TRUE)) )
}

You can then compute the likelihood of some parameters, given the data, for instance like this:

## read data
dat <- read.table("../../data/model_linear.csv", header=TRUE)
x <- dat$x
y <- dat$y

## parameters for testing:
par.linear.test = c(2, 1, 1)

## compute likelihood for these parameters:
loglikelihood.linear(x, y, par.linear.test)

## [1] -27.55146

Computing the log-likelihood for the given experimental data and the sets of model parameters, we see that the likelihood of the parameter set \(\{\beta = 2,\, \gamma = 1, \, \sigma = 1\}\) is larger. This implies that these parameters are more likely to have generated the data, assuming that the linear model is an accurate description of the data generating mechanism.

Julia

The likelihood can be implemented like this

using Distributions

function loglikelihood_linear(beta, gamma, sigma, x, y)
    y_det = beta .* x .+ gamma
    return sum(logpdf.(Normal.(y_det, sigma), y))
end

Then read you data so that you can manipulate it

using DataFrames
import CSV

linear_data = CSV.read(joinpath("..", "..", "data", "model_linear.csv"),
                       DataFrame)
#show(linear_data, allrows=true) if you want to see all your data

x = linear_data.x
y = linear_data.y

Compute

#Log likelihood with parameters β = 2, γ = 1, σ = 1
loglikelihood_linear(2,1,1, x, y)

## -27.55146249091466

#Log likelihood with parameters β = 3, γ = 2, σ = 1
loglikelihood_linear(3,2,1, x, y)

## -166.7271070528497

#parameters  β = 2, γ = 1, σ = 1 are more likely to have produced to data

Python

The following libraries and modules are required:

from models import model_monod, model_growth
import pandas as pd
import numpy as np
from scipy.stats import norm, multinomial
import matplotlib.pyplot as plt
import seaborn as sns

The likelihood can be implemented like this:

def loglikelihood_linear(x, y, par):
    """
    Calculate the log-likelihood for a linear model.

    Parameters:
    x (array-like): Independent variable.
    y (array-like): Dependent variable.
    par (list or array-like): Parameters of the model [slope, intercept, standard deviation].

    Returns:
    float: The log-likelihood value.
    """
    # Deterministic part
    y_det = par[0] * x + par[1]
    
    # Calculate log-likelihood
    log_likelihood = np.sum(norm.logpdf(y, loc=y_det, scale=par[2]))
    
    return log_likelihood

You can then compute the likelihood of some parameters, given the data, for instance like this:

# Loading data
# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"

# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")

# Extract x and y columns
x = dat['x']
y = dat['y']

# Parameters for testing
par_linear_test = [2, 1, 1]

# Compute likelihood for these parameters
result = loglikelihood_linear(x, y, par_linear_test)
print(f"Log-likelihood: {result}")

## Log-likelihood: -27.551462490914652

Likelihood optimisation

R

Calling the optim function then looks like this:

## define starting parameters for optimisation:
par.init <- c(beta = 2,
              gamma = 1,
              sigma = 1)

## because `optim` is minimizing, we define the negative loglikelihood
neg.loglikelihood.linear <- function(x, y, par) -loglikelihood.linear(x, y, par)

## maximum likelihood estimation
MLEparameters <- optim(par = par.init, fn = neg.loglikelihood.linear, x=x, y=y)

## inspect the result:
MLEparameters$par

##      beta     gamma     sigma 
## 2.2534583 0.3842564 0.7990461

We can check that the result looks reasonable by plotting the resulting linear model together with the data:

# model output from x-input and estimated parameters:
y.estimate = MLEparameters$par[1]*x + MLEparameters$par[2]

plot(x, y)
lines(x, y.estimate, col=2)

Julia

Implementing the negative loglikelihood that we will minimize. Note, we optimize for the log_sigma to avoid any negative values for sigma.

function neg_loglikelihood_linear(θ::Vector)
    beta, gamma, log_sigma = θ
    # minus sign to be able to minimize after
    - loglikelihood_linear(beta, gamma, exp(log_sigma), x, y)
end

Minimizing

using Optim
θinit = [2.0, 1, log(0.5)]

## 3-element Vector{Float64}:
##   2.0
##   1.0
##  -0.6931471805599453

param = Optim.minimizer(optimize(neg_loglikelihood_linear, θinit));

param[3] = exp(param[3])        # back-transform to sigma

## 0.7991010329384892

param

## 3-element Vector{Float64}:
##  2.25342685673029
##  0.3843207189590402
##  0.7991010329384892

Plotting

using Plots

scatter(x, y,
    labels=false,
    xlabel = "x",
    ylabel = "y");
plot!(x -> param[1]*x + param[2],
      labels = false)

Python

Calling the optim function then looks like this:

from scipy.optimize import minimize

# Define the negative log-likelihood function
def neg_loglikelihood_linear(par, x, y):
    return -loglikelihood_linear(x, y, par)

# Define starting parameters for optimization
par_init = [2, 1, 1]  # [beta, gamma, sigma]

# Perform maximum likelihood estimation
result = minimize(neg_loglikelihood_linear, par_init, args=(x, y))

# Inspect the result
MLEparameters = result.x
print(f"MLE Parameters: {MLEparameters}")

## MLE Parameters: [2.25342197 0.38434878 0.79910834]

We can check that the result looks reasonable by plotting the resulting linear model together with the data:

# Model output from x-input and estimated parameters
y_estimate = MLEparameters[0] * x + MLEparameters[1]

# Plot the original data
plt.figure(figsize=(14, 6))

plt.scatter(x, y, label='Observed Data')

# Plot the estimated values
plt.plot(x, y_estimate, color='red', label='Estimated Model')

# Add labels and legend
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

Linear regression

The linear regression functions works differently (it uses a method called “QR decomposition”), but should give very similar results to our likelihood optimisation.

R

mod <- lm(y~x)
mod

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      0.3843       2.2534

Note, (intercept) corresponds to \(\gamma\) and x to \(\beta\).

Julia

using GLM
linear_mod = GLM.lm(@formula(y ~ x), linear_data)

## StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
## 
## y ~ 1 + x
## 
## Coefficients:
## ────────────────────────────────────────────────────────────────────────
##                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
## ────────────────────────────────────────────────────────────────────────
## (Intercept)  0.384349    0.353936   1.09    0.2911  -0.356447    1.12514
## x            2.25342     0.121103  18.61    <1e-12   1.99995     2.50689
## ────────────────────────────────────────────────────────────────────────

Note, (intercept) corresponds to \(\gamma\) and x to \(\beta\).

Python

import statsmodels.api as sm

# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)

# Fit the linear model
model = sm.OLS(y, x_with_const).fit()

# Print the summary of the model
print(model.summary())

##                             OLS Regression Results                            
## ==============================================================================
## Dep. Variable:                      y   R-squared:                       0.948
## Model:                            OLS   Adj. R-squared:                  0.945
## Method:                 Least Squares   F-statistic:                     346.2
## Date:                Wed, 18 Jun 2025   Prob (F-statistic):           1.18e-13
## Time:                        11:37:10   Log-Likelihood:                -25.088
## No. Observations:                  21   AIC:                             54.18
## Df Residuals:                      19   BIC:                             56.27
## Df Model:                           1                                         
## Covariance Type:            nonrobust                                         
## ==============================================================================
##                  coef    std err          t      P>|t|      [0.025      0.975]
## ------------------------------------------------------------------------------
## const          0.3843      0.354      1.086      0.291      -0.356       1.125
## x              2.2534      0.121     18.608      0.000       2.000       2.507
## ==============================================================================
## Omnibus:                        1.962   Durbin-Watson:                   1.783
## Prob(Omnibus):                  0.375   Jarque-Bera (JB):                1.122
## Skew:                           0.203   Prob(JB):                        0.570
## Kurtosis:                       1.943   Cond. No.                         6.14
## ==============================================================================
## 
## Notes:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Note, (intercept) corresponds to \(\gamma\) and x to \(\beta\).

2. Likelihood for model “Monod” ★

Analytical expression

Construct the likelihood function for the model “Monod”. Use the deterministic part of the model given in the introduction to the exercises and assume i.i.d. normal errors on the deterministic model output. The result will look very similar to the solution of Exercise 1.1!

Likelihood evaluation

Implement a function that returns the logarithm of the likelihood, for given model parameters and measurement data. Read the data provided in the file model_monod_stoch.dat and use them as “observation data” to compute the log-likelihood for the parameter sets \(\{r_{max} = 5,\, K = 3, \, \sigma = 0.2\}\) and \(\{r_{max} = 10,\, K = 4, \, \sigma = 0.2\}\). Which parameters are more likely to have generated this data?

Solutions

Analytical expression

For the model “Monod” with additive normal i.i.d. noise, constructing the likelihood follows exactly the same procedure as in the linear case (Exercise 1.1), only with a simple change in the deterministic part of the model. So again, our complete probabilistic model is given by the deterministic part plus a Gaussian noise term,

\[ Y_i = y_{i,det} + Z_i , \quad \quad Z_i \sim \mathcal{N}(0,\sigma) \;, \]

but now the deterministic part is given by

\[ y_{det}(x,\boldsymbol{\theta}_{monod}) = r(C,r_{max}, K) = \frac{r_{max} C}{K+C} \;. \label{eqn:monodDet} \]

We therefore again have three model parameters in total: \(\boldsymbol{\theta} = \{ \boldsymbol{\theta}_{monod},\sigma\} = \{r_{max}, K,\sigma\}\).

The expression for the likelihood is then similar to that of the linear model:

\[ L(\boldsymbol{\theta}) = p(\mathbf{y}|\mathbf{C},r_{max}, K,\sigma) = \frac{1}{(2 \pi)^{n/2} \sigma^n } \exp\left( {-\frac{1}{2\sigma^2}} \sum_{i=1}^n (y_i - r(C_i,r_{max},K))^2 \right) \;. \]

Likelihood evaluation

R

The implementation then looks like this:

source("../../models/models.r") # load the deterministic model

loglikelihood.monod <- function(yobs, C, par){

    ## deterministic part:
    y.det <- model.monod(par, C)

    ## Calculate loglikelihood:
    return( sum(dnorm(yobs, mean=y.det, sd=par['sigma'], log=TRUE)) )
}

The parameters with higher likelihood are more likely to have generated the data. In our case, this should be the parameter set \(\{r_{\mathrm{max}} = 5,\,K = 3,\,\sigma=0.2\}\).

Plotting the data together with the model output given those parameters looks ok too:

## read data
dat <- read.table("../../data/model_monod_stoch.csv", header=TRUE)
C <- dat$C
r <- dat$r

## compare the two parameter vectors
par1 <- c(r_max = 5, K = 3, sigma=0.2)
par2 <- c(r_max = 10, K = 4, sigma=0.2)

loglikelihood.monod(r, C, par1)

## [1] -53.14229

loglikelihood.monod(r, C, par2)

## [1] -1346.384

## prediction
par <- c(r_max = 5, K = 3, sigma=0.2)
r.det <- model.monod(par, C)

## plot
plot(C, r)
lines(C, r.det, col=2)

Julia

Implementing the loglikelihood

using ComponentArrays
using Distributions
include("../../models/models.jl") # load the deterministic model

function loglikelihood_monod(yobs, C, par)
    ## deterministic part:
    ydet = model_monod(C, par)

    ## Calculate loglikelihood:
    return sum(logpdf.(Normal.(ydet, par.sigma), yobs))
end

Read the data

monod_data = CSV.read(joinpath("..", "..", "data", "model_monod_stoch.csv"),
             DataFrame)

C = monod_data.C
yobs = monod_data.r

We can compute

## compare the two parameter vectors
par1 = ComponentVector(r_max = 5, K = 3, sigma=0.2)

## ComponentVector{Float64}(r_max = 5.0, K = 3.0, sigma = 0.2)

par2 = ComponentVector(r_max = 10, K = 4, sigma=0.2)

## ComponentVector{Float64}(r_max = 10.0, K = 4.0, sigma = 0.2)


loglikelihood_monod(yobs, C, par1)

## -53.142293490516785

loglikelihood_monod(yobs, C, par2)

## -1346.3842113894802

Plotting

# use the better parameters to compute the deterministic predictions
y_det = model_monod(C, par1)

scatter(C, yobs,
        labels = false,
        xlabel = "C",
        ylabel = "r");
plot!(C, y_det,
      labels = false)

Python

The implementation then looks like this:


def loglikelihood_monod(yobs, C, par):
    """
    Calculate log-likelihood of observations yobs given parameters par and substrate concentration C.
    
    Arguments:
    ----------
    - yobs: array-like
        Observed data.
    - C: array-like
        Substrate concentration.
    - par: array with the parameters
        Parameters including (in this order):
        - 'r_max': maximum growth rate
        - 'K': half-saturation concentration
        - 'sigma': standard deviation of the normal distribution for the likelihood
        
    Returns:
    -------
    log_likelihood: float
        The calculated log-likelihood of the observed data.
    """
    
    # Deterministic part: calculate y.det using model_monod function
    y_det = model_monod(par, C)
    sigma = par[-1]
    # Calculate log-likelihood: sum of log probabilities of yobs given y_det and sigma
    log_likelihood = np.sum(norm.logpdf(yobs, loc=y_det, scale=sigma))
    
    return log_likelihood

The parameters with higher likelihood are more likely to have generated the data. In our case, this should be the parameter set \(\{r_{\mathrm{max}} = 5,\,K = 3,\,\sigma=0.2\}\).

Plotting the data together with the model output given those parameters looks ok too:

# Loading data
# Specify the path to the CSV file
file_path = r"../../data/model_monod_stoch.csv"

# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")

C = dat['C']
r = dat['r']

# Define the two parameter vectors
# 'r_max': 5, 'K': 3, 'sigma': 0.2
# 'r_max': 10, 'K': 4, 'sigma': 0.2

par1 = [5, 3, 0.2]
par2 = [10, 4, 0.2]

# Define the two parameter vectors
#par1 = {'r_max': 5, 'K': 3, 'sigma': 0.2}
#par2 = {'r_max': 10, 'K': 4, 'sigma': 0.2}

# Calculate the log-likelihood using the loglikelihood_monod function
loglikelihood_par1 = loglikelihood_monod(r, C, par1)
loglikelihood_par2 = loglikelihood_monod(r, C, par2)

# Output the log-likelihood results
print("Log-likelihood for par1:", loglikelihood_par1)

## Log-likelihood for par1: -53.14229349051678

print("Log-likelihood for par2:", loglikelihood_par2)

## Log-likelihood for par2: -1346.3842113894802

# Assume C and r are already imported and available
# par is a vector with parameters 'r_max', 'K' and 'sigma'
par = [5, 3, 0.2]

# Predict the deterministic growth rate using model_monod function
r_det = model_monod(par, C)

# Plot the observed data
plt.figure(figsize=(14, 6))
sns.scatterplot(x=C, y=r, label='Observed data')

# Plot the deterministic growth rate as a line
sns.lineplot(x=C, y=r_det, color='red', label='Deterministic model')

# Add labels and a legend
plt.xlabel('Substrate concentration (C)')
plt.ylabel('Growth rate (r)')
plt.legend()

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

3. Forward model simulation ★

Deterministic model simulation

Produce deterministic model outputs. Do this for the two example models:

Model “Monod” over a concentration (\(C\)) range from 0 to 10 with the default parameter values \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Model “Growth” over a time interval from 0 to 2 with default parameter values \(\mu = 4\), \(K = 10\), \(b = 1\), \(Y = 0.6\), \(C_{\mathrm{M,ini}} = 10\), and \(C_{\mathrm{S,ini}} = 50\).

Plot and interpret the results.

Hints

R

Both deterministic models are already implemented as model.monod and model.growth in the file models.R .

Julia

Both deterministic models are already implemented as model_monod and model_growth in the file models.jl .

Python

Both deterministic models are already implemented as model_monod and model_growth in the file models.py .

Probabilistic model simulation

For the model “Monod”, write a function that produces model outputs (i.e. samples from the probabilistic model). In other words, we want to simulate new observation data including observation noise.

For what could that be useful?

Make the assumptions that the noise is i.i.d normal with standard deviation \(\sigma = 0.2\). For the deterministic model use \(r_{\mathrm{max}} = 5\) and \(K = 3\).

Use your function to simulate several thousand probabilistic model realisations (hypothetical data sets), for fixed model parameters, and plot the 10% and 90% quantiles as continuous prediction bands.

Hints

R

For the computation of quantiles you can use the R function quantile.

Julia

For the computation of quantiles, you can use the function quantile.

Python

For the computation of quantiles, you can use the function np.quantile.

Solutions

Deterministic model simulation

R

Monod Model

source("../../models/models.r") # load the model

## parameters
C.monod <- seq(0, 10, 0.1)    # concentrations at which model output should be calculated
par.monod <- c(r_max=5, K=3)   # define parameter vector for the model "Monod"

## run model
r.det <- model.monod(par.monod, C.monod)

## plot result
plot(C.monod, r.det, col=2, type="l", ylab="r")

Growth Model

source("../../models/models.r") # load the model

## parameters
par.growth <- c(mu=4,
                K=10,
                b=1,
                Y=0.6,
                C_M_ini=10,
                C_S_ini=50)

times <- seq(0, 2, length=101) # time resolution of solution

## run model
out <- model.growth(par.growth, times)

## plot results
plot(out$time, out$C_M, col=3, type="l", ylim=c(0, 51),
     xlab="time", ylab="concentration")
lines(out$time, out$C_S, col=4)
legend("topright", legend=c("microorganisms", "substrate"),
       col=c(3,4), lty=1)

Julia

Monod model

using ComponentArrays
include("../../models/models.jl") # load the model

#set parameters
par = ComponentVector(r_max = 5, K = 3)
C_monod = 0:0.1:10

# run deterministic model monod
Y_monod = model_monod(C_monod, par)

Plotting

plot(C_monod, Y_monod,
    labels = false,
    xlabel = "C",
    ylabel = "r")

Growth model

using ComponentArrays
include("../../models/models.jl") # load the model

# set parameters
par = ComponentVector(mu=4, K=10, b=1, Y=0.6,
                      inits = (C_M=10.0, C_S=50.0))
times = 0:0.01:3

# run model
res = model_growth(times, par)

Plotting

plot(res.time, res.C_M,
     label = "C_M",
     xlabel = "time",
     ylabel = "concentrations",
     title = "Deterministic model growth");
plot!(res.time, res.C_S, label="C_S")

Python

Monod Model

# Parameters
C_monod = np.arange(0, 10.1, 0.1)  # concentrations at which model output should be calculated
par_monod = [5, 3]   # define parameter vector for the model "Monod"

# Run model
r_det = model_monod(par_monod, C_monod)

# Plot result
plt.figure(figsize=(14, 6))
sns.lineplot(x=C_monod, y=r_det, color='red')
plt.ylabel('r')
plt.xlabel('C')
plt.title('Model Monod output')

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

Growth Model

# Parameters
# 'mu', 'K', 'b','Y','C_M_ini','C_S_ini'

par_growth = [4, 10, 1, 0.6, 10, 50]

times = np.linspace(0, 2, 101)  # time resolution of solution

# Run model
out = model_growth(par_growth, times)

# Plot results
plt.figure(figsize=(14, 6))
sns.lineplot(x=out['time'], y=out['C_M'], color='green', label='microorganisms')
sns.lineplot(x=out['time'], y=out['C_S'], color='blue', label='substrate')
plt.ylim(0, 51)

## (0.0, 51.0)

plt.xlabel('time')
plt.ylabel('concentration')
plt.legend(loc='upper right')
plt.title('Growth of microorganisms and substrate over time')
plt.show()

Stochastic model simulation

The growth model takes the input variables \(\mathbf{C} = (C_1,...,C_{n})\) and returns the deterministic growth rate \(\mathbf{r}_{\mathrm{det}} = \mathbf{r}(\mathbf{C},r_{\mathrm{max}},K) = \{r(C_1,r_{\mathrm{max}},K),\dots,r(C_n,r_{\mathrm{max}},K)\}\). We simulate the measurement errors by generating \(n\) random normal values with mean \(0\) and standard deviation \(\sigma\) and add them to the deterministic model output.

R

source("../../models/models.r") # load the model

simulate.monod.stoch <- function(par, C){
    ## run model
    r.det <- model.monod(par, C)

    ## generate noise
    z <- rnorm(length(C), 0, par["sigma"])

    return(r.det + z)
}

## parameters
C.monod <- seq(0, 10, 0.1)    # concentrations at which model output should be calculated
par.monod <- c(r_max=5, K=3, sigma=0.2)   # define parameter vector for the model "Monod"
n.rep <- 1000

## run `n.rep` simulations
r.sims <- replicate(n=n.rep, simulate.monod.stoch(par.monod, C.monod))
dim(r.sims)

## [1]  101 1000

## compute quantiles
quant <- apply(r.sims, MARGIN=1, FUN=quantile, probs=c(0.1, 0.9))
dim(quant)

## [1]   2 101

## plot result
plot(C.monod, r.det, type="n", ylab="r")
polygon(c(C.monod,rev(C.monod)), c(quant[1,],rev(quant[2,])), col = "grey85")
lines(C.monod, r.det, col=2, lwd=2, ylab="r")

Julia

using ComponentArrays
include("../../models/models.jl") # load the deterministic model

# function to simulate stochastic realisations
function simulate_monod_stoch(C, par)
    Ydet = model_monod(C, par)
     z = rand(Normal(0, par.sigma), length(Ydet)) # adding noise
    Ydet .+ z
end

# parameters
par = ComponentVector(r_max = 5, K = 3, sigma=0.2)
C_monod = 0:0.1:10
n_sample = 1000

monod_stoch = hcat((simulate_monod_stoch(C_monod, par) for _ in 1:n_sample)...)

#compute quantile
low_quantile = [quantile(monod_stoch[i,:], 0.025) for i in 1:length(C_monod)]
med_quantile = [quantile(monod_stoch[i,:], 0.5) for i in 1:length(C_monod)]
upper_quantile = [quantile(monod_stoch[i,:], 0.975) for i in 1:length(C_monod)]

Plotting

plot(C_monod, upper_quantile,
    fillrange = low_quantile,
    labels = false,
    xlabel = "C",
    ylabel = "r");
plot!(C_monod, med_quantile,
    labels = false)

Python

def simulate_monod_stoch(par, C):
    """
    Simulate the Monod model with stochastic noise.
    
    Arguments:
    ----------
    - par: Array containing the following parameters:
        - r_max: maximum growth rate
        - K: half-saturation concentration
        - sigma: standard deviation of noise
    - C: numpy array containing substrate concentrations
    
    Value:
    ------
    A numpy array representing the growth rate with stochastic noise added.
    """
    # Run deterministic model
    r_det = model_monod(par, C)

    # Generate noise using a normal distribution with mean 0 and standard deviation `sigma`
    sigma = par[-1]
    z = np.random.normal(0, sigma, size=len(C))
    
    # Add noise to the deterministic model results
    return r_det + z

# Parameters
C_monod = np.arange(0, 10.1, 0.1)  # concentrations at which model output should be calculated
par_monod = [5,  3,  0.2]  # define parameter array for the model "Monod"
n_rep = 1000  # number of simulations

# Run `n_rep` simulations
r_sims = np.array([simulate_monod_stoch(par_monod, C_monod) for _ in range(n_rep)])

# The dimensions of `r_sims` array will be (n_rep, len(C_monod))
print(r_sims.shape)  # Should print (1000, 101)

## (1000, 101)

# Compute quantiles
# r_sims is a numpy array where each row is a simulation and each column represents a concentration
quant = np.quantile(r_sims, q=[0.1, 0.9], axis=0)

# The resulting `quant` is a 2D array where the first row contains the 10th percentile,
# and the second row contains the 90th percentile for each concentration.
print(quant.shape)  # Prints the shape of the resulting quantile array

## (2, 101)

# Define variables for plotting
C_monod = np.linspace(0, 10, 101)  # Concentrations at which model output is calculated

# Plot result
plt.figure(figsize=(14, 6))

# Plot deterministic model output using seaborn
sns.lineplot(x=C_monod, y=r_det, color='red', label='Deterministic model', linewidth=2)

# Create a shaded region between the quantiles (0.1 and 0.9)
plt.fill_between(C_monod, quant[0, :], quant[1, :], color='gray', alpha=0.5, label='10th-90th percentile')

# Set labels for the x and y axes
plt.xlabel('C')
plt.ylabel('r')

# Add a legend
plt.legend()

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

4. Likelihood and forward simulation for the model “Survival” ♛

Analytical expression for the likelihood

Construct the likelihood for the model “Survival”. The mortality rate \(\lambda\), upon multiplication with an infinitesimal time interval \(\Delta\), denotes the probability to die within this time interval, given that one is still alive at the beginning of it. If \(S(t)\) denotes the probability of an individual to be still alive at time point \(t\), this reads as \[ \text{Prob}\left(\text{die in }[t, t+\Delta \right) \mid \text{survived until } t) = \frac{S(t)-S(t+\Delta)}{S(t)} = \lambda \Delta \,. \] After some rearrangement, we let \(\Delta \rightarrow 0\) and obtain the differential equation \[ \dot S(t)=-\lambda S(t)\,. \] Solve this equation to find the time-dependence of \(S(t)\) (Hint: try an exponential ansatz or simply wolframalpha.com). From this solution, derive the probability for an individual to die within time-interval \([t_{i-1},t_i]\). Now, consider \(N\) independent individuals, each with the same mortality rate \(\lambda\). Derive the likelihood function, for a vector of death counts \({\bf y}=(y_1,\dots,y_n)\), where \(y_i\) denotes the number of deaths occurring within time interval \([t_{i-1},t_i]\), for \(0=t_0<t_1<\dots<t_n\).

Hint: Look up the definition of the multinomial distribution!

Forward simulation

Write a function that simulates output from this probabilistic model, for given parameter values. Use this function to simulate several thousand model realisations, for fixed model parameters (use \(N=30\) individuals with mortality rate \(\lambda = 0.2d^{-1}\) and \(5\) subsequent observation windows of one day each). Use a boxplot to visualize the results.

Likelihood evaluation

Implement a function that returns the logarithm of the likelihood, for given parameter values and measurement data. Check that your log-likelihood is implemented correctly by generating model outputs and computing the likelihood for several parameters, including the one that you generated the data with.

Solutions

Analytical expression for the likelihood

The probability for an individual with mortality rate \(\lambda\) to survive until time \(t\) is given by the exponential \[ S(t,\lambda)=e^{-\lambda t}\,. \]

Thus, the probability of the individual to die within time interval \([t_{i-1},t_i]\) is given by \[ p_{i,\lambda}=S(t_{i-1},\lambda) - S(t_i,\lambda)\,. \]

Assume that we have \(N\) individuals, each with the same mortality rate. We want to calculate the probability that \(y_1\) individuals die in time interval \([t_0,t_1]\), \(y_2\) individuals die in time interval \([t_2,t_3]\) etc. Those individuals who survive the whole experiment die in the time interval \([t_n,\infty]\), and this happens with probability \(p_{n+1}=S(t_n)\). For the time being, assume that we distinguish the individuals and want to calculate the probability that individual 1 dies in, say, time interval \([t_3,t_4]\), individual 2 in time interval, say, \([t_1,t_2]\) etc. in such a way that the counts \(\bf y\) are assumed. Due to the independence of the individuals, the probability, for such an event, is then simply given by the product \[ p_{1,\lambda}^{y_1} \cdot p_{2,\lambda}^{y_2} \cdot \ldots \cdot \, p_{n,\lambda}^{y_n}\,. \]

Since we do not distinguish individuals, we have to multiply this product by a so-called multinomial coefficient, which counts the number of ways to put \(N\) individuals into \(n+1\) buckets, with \(y_1\) ending up in the first bucket etc. Thus, the likelihood function, for model parameter \(\lambda\), given an output \(\bf y\) of our survival model is described by the so-called multinomial distribution \[ p({\bf y} \mid \lambda) = L_{\text{survival}}(\lambda) = \frac{N!}{y_1!\cdots y_{n+1}!} p_{1,\lambda}^{y_1} \cdot p_{2,\lambda}^{y_2} \cdot \ldots \cdot p_{n+1,\lambda}^{y_{n+1}}\,. \]

Forward simulation

R

The simplest way to simulate a model output is to use the R function rmultinom. Alternatively, we can simulate the death time, for each individual, drawing from the exponential distribution \[ f(t|\lambda)=\lambda e^{-\lambda t}\,, \] which is the negative derivative of the survival probability \(S(t)\). After that, we count how many individuals die within the different time intervals.

## parameters for model "Survival":

N            <- 30                 # Number of individuals
t            <- 1:5                # Times at which the numbers of deaths are counted
par.survival <- c(lambda=0.2)    # Parameter vector of the model "Survival"


## Simulation from  model "Survival".
## Here we  sample from the multinomial distribution implemented in R:
model.survival <- function(N, t, par){
    ## Calculate survival probabilities at measurement points
    ## and add zero "at time-point infinity":
    S <- exp(-par*c(0,t))
    S <- c(S,0)

    ## Calcute probabilities to die within observation windows:
    p <- -diff(S)

    ## Simulate from multinomial distribution:
    y <- rmultinom(1, N, p)[-length(p)]

    return(y)
}


## Alternatively, we generate the exponentially distributed
## times of death and bin them
model.survival.alt <- function(N, t, par) {
    ## Generate deaths times:
    T <- rexp(N, par)

    ## Count number of deaths within observation windows:
    t2 <- c(t,6)     # construct the bins
    y <- table(cut(T+1, breaks = t2)) # do the data binning

    return(y)
}


## Generate n.sample simulations from the probabilistic model
n.sample  <- 1000
pred.survival <- matrix(nrow=n.sample, ncol=length(t))
for (i in 1:n.sample){
    pred.survival[i,] <- model.survival(N, t, par.survival)
}


## Make a boxplot:
boxplot(pred.survival, xlab="time", ylab="# death")

Julia

Create a function survival, whose arguments are:

N, the number of individuals,
t, a vector of the form 1:k
λ, the mortality rate.

Returns a random vector y of number of deaths per bin.

function survival(N::Int64, t, λ)
    # S is a vector whose first component is 1 and last is 0,
    # because the probability to be alive at time t=0 is 1,
    # and the probability to be alive at time t=infinity is 0
    S = exp.(-λ .* t)
    S = [1; S; 0]
    p = -diff(S)

    y = rand(Multinomial(N, p))[1:(end-1)]
    return y
end

# setting parameter
N = 30
t = 1:5
λ = 0.2
n_sample = 1000

pred_survival = Matrix{Int}(undef, n_sample, length(t))
for i in 1:n_sample
    pred_survival[i,:] = survival(N, t, λ)
end

Plotting

using StatsPlots

boxplot(pred_survival,
        labels = false, xlab="time", ylab="# deaths")

Python

Create a function survival, whose arguments are:

N, the number of individuals,
t, a vector of the form 1:k
λ, the mortality rate.

Returns a random vector y of number of deaths per bin.

# Parameters for the survival model
N = 30  # Number of individuals
t = np.array([1, 2, 3, 4, 5])  # Times at which the numbers of deaths are counted
par_survival = [0.2]  # Parameter vector of the model "Survival": 'lambda'

# Function for the survival model
def model_survival(N, t, par):
    lambda_ = par[0]
    
    # Calculate survival probabilities at measurement points and add zero "at time-point infinity"
    S = np.exp(-lambda_ * np.concatenate(([0], t)))
    S = np.append(S, 0)  # Add zero "at time-point infinity"
    
    # Calculate probabilities to die within observation windows
    p = -np.diff(S)
    
    # Simulate from multinomial distribution
    y = multinomial.rvs(N, p)[:-1]
    
    return y

# Generate simulations from the probabilistic model
n_sample = 1000
pred_survival = np.zeros((n_sample, len(t)))

for i in range(n_sample):
    pred_survival[i, :] = model_survival(N, t, par_survival)
    
# Convert the NumPy array to a pandas DataFrame (as input to seaborn)
pred_survival_df = pd.DataFrame(pred_survival, columns=t)

# Plotting a boxplot
plt.figure(figsize=(8, 6))
sns.boxplot(data=pred_survival_df)
plt.xlabel("Time")
plt.ylabel("# Deaths")
plt.title("Boxplot of # Deaths at Different Times")

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

Likelihood evaluation

R

loglikelihood.survival <- function(y, N, t, par){

    ## Calculate survival probabilities at measurement points
    ## and add zero "at time-point infinity":
    S <- exp(-par*c(0,t))
    S <- c(S,0)

    ## Calcute probabilities to die within observation windows:
    p <- -diff(S)

    ## Add number of survivors at the end of the experiment:
    y <- c(y, N-sum(y))

    ## Calculate log-likelihood of multinomial distribution at point y:
    LL <- dmultinom(y, prob=p, log=TRUE)

    return(LL)
}

## simulate data
y.sim <- model.survival(N, t, par.survival)

## evaluation likelihood
loglikelihood.survival(y.sim, N, t, par.survival)

## [1] -8.791774

Julia

function loglikelihood_survival(N::Int, t, y, λ)
    S = exp.(-λ .* t)
    S = [1; S; 0]
    p = -diff(S)
    ## Add number of survivors at the end of the experiment:
    push!(y, N-sum(y))

    logpdf(Multinomial(N, p), y)
end

# Generating and checking
N = 30

## 30

t = 1:5

## 1:5

λ = 0.2

## 0.2

y = survival(N, t, λ)

## 5-element Vector{Int64}:
##  5
##  5
##  2
##  3
##  5


loglikelihood_survival(N, t, y, λ)

## -9.048994121789905

Python

def loglikelihood_survival(y, N, t, par):
    """
    Calculate the log-likelihood of the observed data y given the parameters par
    and the times t using the multinomial distribution.

    Arguments:
    - y: observed data (number of deaths at each time point).
    - N: total number of individuals in the experiment.
    - t: time points at which the number of deaths are counted.
    - par: dictionary containing the lambda parameter for the model.

    Returns:
    - LL: log-likelihood value.
    """
    lambda_ = par[0]
    
    # Calculate survival probabilities at measurement points and add zero "at time-point infinity"
    S = np.exp(-lambda_ * np.concatenate(([0], t)))
    S = np.append(S, 0)
    
    # Calculate probabilities to die within observation windows
    p = -np.diff(S)
    
    # Add number of survivors at the end of the experiment
    y = np.append(y, N - np.sum(y))
    
    # Calculate log-likelihood of multinomial distribution at point y
    LL = multinomial.logpmf(y, N, p)
    
    return LL

# Simulate data using the model_survival function
def model_survival(N, t, par):
    lambda_ = par[0]
    
    # Calculate survival probabilities at measurement points and add zero "at time-point infinity"
    S = np.exp(-lambda_ * np.concatenate(([0], t)))
    S = np.append(S, 0)
    
    # Calculate probabilities to die within observation windows
    p = -np.diff(S)
    
    # Simulate from multinomial distribution
    y = multinomial.rvs(N, p)[:-1]
    
    return y

# Parameters for the survival model
N = 30  # Number of individuals
t = np.array([1, 2, 3, 4, 5])  # Times at which the numbers of deaths are counted
par_survival = [0.2]  # Parameter vector of the model "Survival"

# Set seed for reproducibility
np.random.seed(1)

# Simulate data
y_sim = model_survival(N, t, par_survival)

# Evaluate log-likelihood
LL = loglikelihood_survival(y_sim, N, t, par_survival)
print(f"Log-likelihood: {LL}")

## Log-likelihood: -11.182319106189894

5. Sensitivity analysis

For the “Monod” model try and compare different sensitivity analysis approaches.

Manual sensitivity analysis

Increase the model parameters by \(10 \%\) and by \(50 \%\), redo the forward simulations, and try to understand the effect of the parameter change on the model results.

Which input concentrations do you use?

Local sensitivity analysis

A simple metric that measures the sensitivity of model parameters \(\theta\) to model output \(Y\) is defined as \[ s_{loc} = \frac{\Delta Y}{\Delta \theta} \] Often a relative metric is easier to interpret: \[ s_{loc} = \frac{\theta}{Y} \frac{\Delta Y}{\Delta \theta} \]

Set the ranges \(\Delta \theta\) to \(10 \%\) and by \(50 \%\) of the parameter values and compute the values for different concentrations (inputs). What is your interpretation?

Variance-based sensitivity

Conduct a variance-based regional (global) sensitivity analyses with uniform parameter distributions.

Hints

R

Use the fast99 function implemented in the package sensitivity. The template below should get you started. For details see also the documentation of fast99.

library(sensitivity)

## 1) get sampling points in parameter space
lower = ...
upper = ...
res.fast <- fast99(factors = names(par.monod),
                   n = 100,
                   q = "qunif",
                   q.arg = list(list(min=lower[1], max=upper[1]),
                                list(min=lower[2], max=upper[2]))
                   )
## all parameter combinations to run the model with
head(res.fast$X)

## 2) run the model for all parameter combinations:
res.SA <- matrix(NA, nrow=nrow(res.fast$X), ncol=...)
for(i in 1:nrow(res.fast$X)){
    res.SA[i,] <- model(unlist(res.fast$X[i,]), ...)
}

## 3) compute sensitivity index
S <- tell(res.fast, y = res.SA.monod[, j])
S
plot(S)

Julia

Use the package GlobalSensitivity.jl and the method eFAST. Try to visualize the sensitivities with a plot.

using GlobalSensitivity
using Distributions
# 1) Define a vector of tuples that give the lower and
#    upper bound for each parameter:

# Define a `Distribution` for each parameter
param_dists = [Uniform(...), # r_max = par[1]
               Uniform(...)] # K     = par[2]

# 2) Run SA. `gsa` can handle models with multiple outputs (i.e. concentrations)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
          eFAST(num_harmonics=6),
          param_dists,
          samples = 200)
res.S1                          # direct effect
res.ST                          # total effect

Python

Use the SALib package. Try to visualize the sensitivity with a plot. here there is one example of how to set up a function for the sensititvity analysis.

def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
    """
    Perform local sensitivity analysis on the model function.

    Parameters:
    - model: Function representing the model.
    - par_vector: Array of model parameters.
    - par: Name of the parameter to perform sensitivity analysis on.
    - delta_rel: Relative change in the parameter value.
    - *args, **kwargs: Additional arguments for the model function.

    Returns:
    Dictionary containing absolute and relative sensitivity indices.
    """
    # Copy the parameter array to adjust the parameter of interest
    par_vector2 = par_vector.copy()

    # Adjust the specified parameter by the relative delta
    par_vector2[par] *= (1 + delta_rel)

    # Run the model with the original and adjusted parameter array
    Y = model(par_vector, *args, **kwargs)
    Y2 = model(par_vector2, *args, **kwargs)

    # Compute sensitivity indices
    S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
    S_rel = (par_vector[par] / Y) * S_abs

    # Return sensitivity indices as a dictionary
    return {'S_abs': S_abs, 'S_rel': S_rel}

Solutions

R

Local sensitivity analysis

Because we have to do the same calculations for every parameters, it is a good idea to write a little function that computes the absolute an relative local sensitivity indices. Note, that we use the three dots (...) to pass additional arguments to the model.

local.SA <- function(model, par.vector, par, delta.rel=0.1, ...){
    ## change `par` by delta.rel
    par.vector2 <- par.vector
    par.vector2[par] <- par.vector2[par] * (1 + delta.rel)

    ## run model
    Y <- model(par.vector, ...)
    Y2 <- model(par.vector2, ...)

    ## compute sensitiviy indices
    S.abs <- (Y - Y2) / (par.vector[par] - par.vector2[par])
    S.rel <- par.vector[par] / Y * S.abs

    list(S.abs = S.abs, S.rel=S.rel)
}

We apply our function local_SA for both parameters with different delta.rel:

source("../../models/models.r") # load the model

## define inputs
C.monod <- seq(0, 10, 0.1) # concentrations at which model output should be calculated

## define parameters, the reference point
par.monod <- c(r_max=5, K=3)

## compute sensitiviy indices
S.rmax.10 <- local.SA(model.monod, par.monod, "r_max", delta.rel=0.1, C=C.monod)
S.rmax.50 <- local.SA(model.monod, par.monod, "r_max", delta.rel=0.5, C=C.monod)
S.K.10 <- local.SA(model.monod, par.monod, "K", delta.rel=0.1, C=C.monod)
S.K.50 <- local.SA(model.monod, par.monod, "K", delta.rel=0.5, C=C.monod)

Plotting the absolute indices:

plot(C.monod, S.K.10$S.abs, col=2, type="l", ylim=c(-0.5, 1),
     ylab="absolute sensitivity",
     main="absolute sensitivities")
lines(C.monod, S.K.50$S.abs, col=2, lty = 2)
lines(C.monod, S.rmax.10$S.abs, col=4, lty = 1)
lines(C.monod, S.rmax.50$S.abs, col=4, lty = 2)
legend("topleft", col=c(2,2,4,4), lty=c(1,2),
       legend=c("S_K 10%", "S_K 50%", "S_r.max 10%", "S_r.max 50%"))

Plotting the and relative indices:

plot(C.monod, S.K.10$S.rel, col=2, type="l", ylim=c(-1, 1),
     ylab="relative sensitivity",
     main="relative sensitivities")
lines(C.monod, S.K.50$S.rel, col=2, lty = 2)
lines(C.monod, S.rmax.10$S.rel, col=4, lty = 1)
lines(C.monod, S.rmax.50$S.rel, col=4, lty = 2)
legend("bottomright", col=c(2,2,4,4), lty=c(1,2),
       legend=c("S_K 10%", "S_K 50%", "S_r.max 10%", "S_r.max 50%"))

Variance-based sensitivity

We use the function fast99 to compute 2*n parameter combinations for which we will evaluate the model:

library(sensitivity)

## sensitivity sampling points
lower = par.monod * 0.7
upper = par.monod * 1.3
res.fast <- fast99(factors = names(par.monod),
                   n = 100,
                   q = "qunif",
                   q.arg = list(list(min=lower[1], max=upper[1]),
                                list(min=lower[2], max=upper[2]))
                   )
## all parameter combinations to run the model with
head(res.fast$X)

##   r_max     K
## 1  5.00 3.000
## 2  5.72 3.036
## 3  6.44 3.072
## 4  5.84 3.108
## 5  5.12 3.144
## 6  4.40 3.180

We then run the model with all combinations in a loop. For slow models, this could be done in parallel.

## calculate results for sample for all concentrations:
res.SA.monod <- matrix(NA, nrow=nrow(res.fast$X), ncol=length(C.monod))
for(i in 1:nrow(res.fast$X)){
    res.SA.monod[i,] <- model.monod(unlist(res.fast$X[i,]), C.monod)
}

After picking a concentration, we can then compute the sensitivity indices

j <- 5 # chose a concentration
C.monod[j]

## [1] 0.4

S <- tell(res.fast, y = res.SA.monod[, j])
S

## 
## Call:
## fast99(factors = names(par.monod), n = 100, q = "qunif", q.arg = list(list(min = lower[1],     max = upper[1]), list(min = lower[2], max = upper[2])))
## 
## Model runs: 200 
## 
## Estimations of the indices:
##       first order total order
## r_max   0.5386887   0.5539790
## K       0.4437621   0.4589948

plot(S)

Julia

Local sensitivity analysis

Because we have to do the same calculations for every parameters, it is a good idea to write a little function that computes the absolute an relative local sensitivity indices:

function local_SA(model::Function, θ::ComponentVector,
                  parameter::Symbol; Δrel=0.1)

    # change `parameter` by `Δrel`:
    θ2 = copy(θ*1.0)
    θ2[parameter] = θ2[parameter] * (1 + Δrel)

    # run model
    Y = model(θ)
    Y2 = model(θ2)

    # compute sensitiviy indices
    S_abs = (Y .- Y2) ./ (θ[parameter] - θ2[parameter])
    S_rel = θ[parameter] ./ Y .* S_abs

    (S_abs = S_abs, S_rel = S_rel)
end

## local_SA (generic function with 1 method)

We apply our function local_SA for both parameters. Note the use of an anonymous functions to pass the additional parameters C_monod. This is a pattern commonly used in Julia.

include("../../models/models.jl"); # load the model

## model_growth


# set concentrations
C_monod = 0:0.1:10;

# compute indices
S_rmax_10 = local_SA(θ -> model_monod(C_monod, θ), par, :r_max, Δrel=0.1);
S_rmax_50 = local_SA(θ -> model_monod(C_monod, θ), par, :r_max, Δrel=0.5);
S_K_10 = local_SA(θ -> model_monod(C_monod, θ), par, :K, Δrel=0.1);
S_K_50 = local_SA(θ -> model_monod(C_monod, θ), par, :K, Δrel=0.5);

Plotting the results

# plot absolute sensitivities
plot(C_monod, [S_rmax_10.S_abs  S_rmax_50.S_abs  S_K_10.S_abs  S_K_50.S_abs],
     labels = ["S_rmax_10"  "S_rmax_50"  "S_K_10"  "S_K_50"],
     xlab = "concentration", ylab="absolute sensitiviy")

# plot absolute sensitivities
plot(C_monod, [S_rmax_10.S_rel  S_rmax_50.S_rel  S_K_10.S_rel  S_K_50.S_rel],
     labels = ["S_rmax_10"  "S_rmax_50"  "S_K_10"  "S_K_50"],
     xlab = "concentration", ylab="relative sensitiviy")

Variance-based sensitivity

using GlobalSensitivity
using Distributions

# Define a distribution for each paraemter
param_dists = [Uniform(0.7*par[1], 1.3*par[1]),
               Uniform(0.7*par[2], 1.3*par[2])]

## 2-element Vector{Uniform{Float64}}:
##  Uniform{Float64}(a=3.5, b=6.5)
##  Uniform{Float64}(a=2.0999999999999996, b=3.9000000000000004)


# We can run multible outputs (i.e. different concentrations at once)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
          eFAST(num_harmonics=6),
          param_dists,
          samples = 200)

## GlobalSensitivity.eFASTResult{Matrix{Float64}}([NaN NaN; 0.4911116923546162 0.49328757009808516; … ; 0.9465700186771833 0.05135616317856734; 0.9473401174137372 0.050607176694682135], [NaN NaN; 0.50523026854676 0.5077289452235891; … ; 0.9485992866581445 0.05280455417864782; 0.9493486132310703 0.052035735620521995])

# res.S1                          # direct effects
# res.ST                          # total effects
p = plot(C_monod, res.ST[:,1], label="r_max, total effect",
         xlab="concentration", ylab="sensitivity index");
plot!(p, C_monod, res.S1[:,1], label="r_max, main effect");
plot!(p, C_monod, res.ST[:,2], label="K, total effect");
plot!(p, C_monod, res.S1[:,2], label="K, main effect");
p

Python

Local sensitivity analysis

Because we have to do the same calculations for every parameters, it is a good idea to write a little function that computes the absolute an relative local sensitivity indices.

from SALib.sample import fast_sampler
from SALib.analyze import fast

def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
    """
    Perform local sensitivity analysis on the model function.

    Parameters:
    - model: Function representing the model.
    - par_vector: Array of model parameters.
    - par: Name of the parameter to perform sensitivity analysis on.
    - delta_rel: Relative change in the parameter value.
    - *args, **kwargs: Additional arguments for the model function.

    Returns:
    Dictionary containing absolute and relative sensitivity indices.
    """
    # Copy the parameter array to adjust the parameter of interest
    par_vector2 = par_vector.copy()

    # Adjust the specified parameter by the relative delta
    par_vector2[par] *= (1 + delta_rel)

    # Run the model with the original and adjusted parameter array
    Y = model(par_vector, *args, **kwargs)
    Y2 = model(par_vector2, *args, **kwargs)

    # Compute sensitivity indices
    S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
    S_rel = (par_vector[par] / Y) * S_abs

    # Return sensitivity indices as a dictionary
    return {'S_abs': S_abs, 'S_rel': S_rel}

We apply our function local_SA for both parameters with different delta.rel:

# Define inputs: concentrations at which model output should be calculated
C_monod = np.arange(0, 10.1, 0.1)

# Define parameters, the reference point
par_monod = [5, 3]

# Compute sensitivity indices using local sensitivity analysis
S_rmax_10 = local_sa(model_monod, par_monod, 0, delta_rel=0.1, C=C_monod)

## <string>:28: RuntimeWarning: divide by zero encountered in divide
## <string>:28: RuntimeWarning: invalid value encountered in multiply

S_rmax_50 = local_sa(model_monod, par_monod, 0, delta_rel=0.5, C=C_monod)
S_K_10 = local_sa(model_monod, par_monod, 1, delta_rel=0.1, C=C_monod)
S_K_50 = local_sa(model_monod, par_monod, 1, delta_rel=0.5, C=C_monod)

Plotting the absolute indices:

# Create a Seaborn style figure
sns.set(style="whitegrid")

# Create a figure
plt.figure(figsize=(10, 6))

# Plot S.K.10 absolute sensitivity
sns.lineplot(x=C_monod, y=S_K_10['S_abs'], color='red', linestyle='-', label='S_K 10%')

# Plot S.K.50 absolute sensitivity
sns.lineplot(x=C_monod, y=S_K_50['S_abs'], color='red', linestyle='--', label='S_K 50%')

# Plot S.rmax.10 absolute sensitivity
sns.lineplot(x=C_monod, y=S_rmax_10['S_abs'], color='blue', linestyle='-', label='S_r.max 10%')

# Plot S.rmax.50 absolute sensitivity
sns.lineplot(x=C_monod, y=S_rmax_50['S_abs'], color='blue', linestyle='--', label='S_r.max 50%')

# Set labels and title
plt.ylabel('Absolute Sensitivity')
plt.title('Absolute Sensitivities')

# Add a legend
plt.legend(loc='upper left')

# Adjust layout and display the plots
plt.tight_layout()
plt.show()

Plotting the and relative indices:

# Create a Seaborn style figure
sns.set(style="whitegrid")

# Create a figure
plt.figure(figsize=(10, 6))

# Plot S.K.10 relative sensitivity
sns.lineplot(x=C_monod, y=S_K_10['S_rel'], color='red', linestyle='-', label='S_K 10%')

# Plot S.K.50 relative sensitivity
sns.lineplot(x=C_monod, y=S_K_50['S_rel'], color='red', linestyle='--', label='S_K 50%')

# Plot S.rmax.10 relative sensitivity
sns.lineplot(x=C_monod, y=S_rmax_10['S_rel'], color='blue', linestyle='-', label='S_r.max 10%')

# Plot S.rmax.50 relative sensitivity
sns.lineplot(x=C_monod, y=S_rmax_50['S_rel'], color='blue', linestyle='--', label='S_r.max 50%')

# Set limits, labels, and title
plt.ylim([-1, 1])

## (-1.0, 1.0)

plt.ylabel('Relative Sensitivity')
plt.title('Relative Sensitivities')

# Add a legend
plt.legend(loc='lower right')

# Display the plot
plt.show()

Variance-based sensitivity

We use the function fast99 to compute 2*n parameter combinations for which we will evaluate the model:

# Define par_monod array
par_monod = [5, 3]  # Example values, adjust as needed

# Calculate lower and upper bounds
lower = np.array(list(par_monod)) * 0.7
upper = np.array(list(par_monod)) * 1.3

# Define the problem dictionary
problem = {
    'num_vars': len(par_monod),
    'names': list(["r_max", "K"]),
    'bounds': list(zip(lower, upper))
}

# Perform sensitivity sampling using FAST99
res_fast = fast_sampler.sample(problem, N=100)

# Display the first 5 rows of the parameter combinations
print(res_fast[:5])

## [[5.55403157 3.33241894]
##  [6.27403157 3.36841894]
##  [6.00596843 3.40441894]
##  [5.28596843 3.44041894]
##  [4.56596843 3.47641894]]

We then run the model with all combinations in a loop. For slow models, this could be done in parallel.

# Assume model_monod function is already defined
# Assume res_fast and C_monod are also defined as per the earlier code

# Initialize an array to hold the results
res_SA_monod = np.empty((res_fast.shape[0], len(C_monod)))

# Loop through each sample
for i in range(res_fast.shape[0]):
    # Get the parameters for the current sample
    parameters = [
        res_fast[i, 0],
        res_fast[i, 1]
    ]

    # Calculate the model_monod results for the given parameters and concentrations
    res_SA_monod[i, :] = model_monod(parameters, C_monod)

After picking a concentration, we can then compute the sensitivity indices

# Choose a concentration
j = 4
selected_concentration = C_monod[j]
print(selected_concentration)

## 0.4

# Assume problem, res_fast, and res_SA_monod are defined as per earlier code
# Assume j is an integer index representing the concentration to analyze

# Perform sensitivity analysis for the specified concentration
# We expect `res_SA_monod[:, j]` to be a one-dimensional array of outputs
# Ensure that `j` is within the range of columns in `res_SA_monod`
S = fast.analyze(problem, Y=res_SA_monod[:, j], print_to_console=True)

##              S1        ST   S1_conf   ST_conf
## r_max  0.539225  0.554401  0.136408  0.131977
## K      0.444932  0.460929  0.138345  0.134694

# Assume S is the sensitivity analysis result obtained from fast.analyze()

# Create a pandas DataFrame to hold the sensitivity indices and their names
data = {
    'Factor': S['names'],
    'First-Order': S['S1'],
    'Total-Order': S['ST']
}
df = pd.DataFrame(data)

# Melt the DataFrame to long format for easier plotting with Seaborn
df_long = df.melt(id_vars='Factor', var_name='Sensitivity Type', value_name='Index')

# Create a barplot with Seaborn
sns.barplot(data=df_long, x='Factor', y='Index', hue='Sensitivity Type')

# Add labels and title
plt.ylabel('Sensitivity Index')
plt.title('Sensitivity Analysis Results')

# Show the plot
plt.show()

Exercise 1: Probabilistic models and likelihood functions

Eawag Summer School in Environmental Systems Analysis

1. Likelihood for a linear model ★

Analytical expression

Likelihood evaluation

Hints

R

Julia

Python

Likelihood optimization

Hints

R

Julia

Python

Linear regression

Hints

R

Julia

Python

Solutions

Analytical expression

Likelihood evaluation

R

Julia

Python

Likelihood optimisation

R

Julia

Python

Linear regression

R

Julia

Python

2. Likelihood for model “Monod” ★

Analytical expression

Likelihood evaluation

Solutions

Analytical expression

Likelihood evaluation

R

Julia

Python

3. Forward model simulation ★

Deterministic model simulation

Hints

R

Julia

Python

Probabilistic model simulation

Hints

R

Julia

Python

Solutions

Deterministic model simulation

R

Monod Model

Growth Model

Julia

Monod model

Growth model

Python

Monod Model

Growth Model

Stochastic model simulation

R

Julia

Python

4. Likelihood and forward simulation for the model “Survival” ♛

Analytical expression for the likelihood

Forward simulation

Likelihood evaluation

Solutions

Analytical expression for the likelihood

Forward simulation

R

Julia

Python

Likelihood evaluation

R