Objectives:
Toy Models:
Some exercises will use the toy models “Monod”, “Growth”, or “Survival”, which were introduced in the previous lectures. You can see their definitions for instance in the slides “Mathematical Representation and Construction of Models”.
Construct the likelihood function for a simple one-dimensional linear model and additive, independent identically distributed (i.i.d.) normal errors. That is, the deterministic part of the model is given by \[ y_{det} (x,\beta,\gamma) = \beta x + \gamma \,. \]
For every observation we have the probabilistic model \[ Y_i = y_{det} (x_i,\beta,\gamma) + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\), so assume a independent Gaussian noise term.
Hint: Have a close look at the slides “Formulation of model likelihood functions” if you need inspiration.
For the linear model, implement a function that returns the logarithm
of the likelihood, for given model parameters and measurement data. Read
the data provided in the file model_linear.csv
and use them
as “observation data” to compute the log-likelihood for two parameter
sets: \(\{\beta = 2,\, \gamma = 1, \, \sigma =
1\}\) and \(\{ \beta = 3,\, \gamma = 2,
\, \sigma = 1 \}\). Which parameters are more likely to have
generated this data?
You can read the external data using the following piece of code:
## read the data:
dat <- read.table("data/model_linear.csv", header=TRUE)
dat$x
dat$y
## plot the data:
plot(dat$x, dat$y, xlab='x', ylab='y')
Here is a template for the log-likelihood implementation – you still need to fill in the important bits!
loglikelihood.linear <- function(par, x, y){
## deterministic part:
y.det <- ... # first evaluate the deterministic model output, using the parameters par = c(beta, gamma, sigma)
## Calculate loglikelihood:
# put your expression for log-likelihood here
# Hint: use function dnorm()
}
Read the data in to a DataFrame
using DataFrames
import CSV
dat = DataFrame(CSV.File("data/model_linear.csv", delim=' '))
dat.x
dat.y
Your loglikelihood function should look like this
using Distributions
function loglikelihood_linear(theta, x, y)
y_det = ... # first evaluate the deterministic model output,
# using the parameters par = c(beta, gamma, sigma)
## Calculate loglikelihood. use `logpdf(Normal(...))`
end
You can read the external data using the following piece of code:
# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"
# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")
Here is a template for the log-likelihood implementation – you still need to fill in the important bits!
def loglikelihood_linear(x, y, par):
"""
Calculate the log-likelihood for a linear model.
Parameters:
x (array-like): Independent variable.
y (array-like): Dependent variable.
par (list or array-like): Parameters of the model [slope, intercept, standard deviation].
Returns:
float: The log-likelihood value.
"""
# Deterministic part
y_det = # put your expression for y_det here
# Calculate log-likelihood
log_likelihood = # put your expression for log-likelihood here
return log_likelihood
Use an optimizer to find the parameter values that maximise the likelihood (the so called maximum likelihood estimator, MLE). Plot the resulting linear model together with the data. Does the result look reasonable?
You can use, for instance, the optim
function:
# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
par.max <- optim(par = c(beta_init, gamma_init, sigma_init), fn = beg.loglikelihood.linear, x=x, y=y)
# You can look at the result like this:
par.max$par
Note that optim
does minimization by default! You will
have to create a negative-loglikelihood function to minimize.
Use the function minimizer
from the package
Optim.jl
. You will have to create a negative-loglikelihood
function to minimize. This could look like this:
using Optim
param = Optim.minimizer(optimize(???, ???));
best_param = ???
You can use, for instance, the minimize()
function:
# Define starting parameters for optimisation (e.g. beta_init, gamma_init, and sigma_init),
# then find maximising parameters using optim:
result = minimize(function_to_minimize, par_init, args=(x, y))
# You can look at the result like this:
result
Note that minimize
does minimization! You will have to
create a negative-loglikelihood function to minimize.
Use the standard linear regression function to estimate the parameters, and compare them to the ones you found through likelihood maximisation in Exercise 1.3.
The function lm
implements linear regression.
## Build a linear regression model
linearModel <- lm(y ~ x, data=data)
## Inspect the model:
coef(linearModel)
summary(linearModel)
Use the function lm
from the package
GLM.jl
(Generalized Linear Model) and the macro
@formula
.
The function statsmodels.api.sm
implements linear
regression.
## Build a linear regression model
# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)
# Fit the linear model
model = sm.OLS(y, x_with_const).fit()
For a one-dimensional linear model with an additive error, a model output \(Y\) is given by the deterministic term \(y_{det}\) plus the noise term. For the noise term we choose here identical independent normal errors, so that for a single observation \(i\) (in a set of \(n\) observations) the full model is given by
\[ Y_i = y_{i,det} + Z_i , \quad \quad Z_i \sim \mathcal{N}(0,\sigma) , \quad \quad i \in \{1,\dots,n\} \;. \]
For the deterministic part, we have an input \(x\) and two model parameters \(\{\beta, \gamma\}\), from which \(y_{det}\) is given by
\[ y_{det}(x,\beta, \gamma) = \beta x + \gamma \;. \]
We therefore have three model parameters in total: \(\beta\) and \(\gamma\) from the deterministic term, and \(\sigma\) from the noise term. For convenience, we collect these into the set of parameters \({\boldsymbol{\theta}} = \{\beta,\gamma,\sigma\}\).
So for a single observation, the model is given by
\[ Y_i = \beta x_i + \gamma +Z_i \;, \]
which is equivalent to
\[ Y_i \sim \mathcal{N}\left(y_{i,det}(x_i,\beta,\gamma),\sigma\right) \;. \]
That is, the model output \(Y_i\) is normally distributed, where the mean is the deterministic model output \(y_{i,det}\) and the standard deviation \(\sigma\) is the standard deviation of the error term \(Z_i\). Note that adding a constant \(\mu\) to a standard normally distributed random variable \(Z\) results in a new normally distributed random variable, the mean of which is simply shifted: \(Y = \mu + Z\), \(Z \sim \mathcal{N}(0,\sigma)\) \(\implies\) \(Y \sim \mathcal{N}(\mu,\sigma)\).
So, the likelihood for a single observation \(y_i\), given input \(x_i\) and parameters \(\boldsymbol{\theta}\), is given by
\[ p(y_i|x_i,\boldsymbol{\theta}) = \frac{1}{\sqrt{2 \pi \sigma^2 }} \exp\left( {-\frac{(y_i - y_{i,det})^2}{2\sigma^2}} \right) \;. \]
Thanks to the assumption of independence, the likelihood of all observations \(\mathbf{y} = \{y_1,y_2,...,y_n\}\), given all inputs \(\mathbf{x} = \{x_1,x_2,...,x_n\}\), is then given by
\[ p(\mathbf{y}|\mathbf{x},\boldsymbol{\theta}) = p(y_1|x_1,\boldsymbol{\theta}) \, p(y_2|x_2,\boldsymbol{\theta}) \, p(y_3|x_3,\boldsymbol{\theta}) \, \dots \, p(y_n|x_n,\boldsymbol{\theta}) = \prod_{i=1}^n p(y_i|x_i,\boldsymbol{\theta}) \;, \]
which is then given in full by
\[ p(\mathbf{y}|\mathbf{x},\boldsymbol{\theta}) = L(\boldsymbol{\theta}) = \prod_{i=1}^n p(y_i|x_i,\boldsymbol{\theta}) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2} } \exp\left( {-\frac{(y_i - y_{i,det})^2}{2\sigma^2}} \right) \;. \]
The likelihood can be implemented like this:
loglikelihood.linear <- function(x, y, par){
## deterministic part:
y.det <- par[1] * x + par[2]
## Calculate log-likelihood:
return( sum(dnorm(y, mean=y.det, sd=par[3], log=TRUE)) )
}
You can then compute the likelihood of some parameters, given the data, for instance like this:
## read data
dat <- read.table("../../data/model_linear.csv", header=TRUE)
x <- dat$x
y <- dat$y
## parameters for testing:
par.linear.test = c(2, 1, 1)
## compute likelihood for these parameters:
loglikelihood.linear(x, y, par.linear.test)
## [1] -27.55146
Computing the log-likelihood for the given experimental data and the sets of model parameters, we see that the likelihood of the parameter set \(\{\beta = 2,\, \gamma = 1, \, \sigma = 1\}\) is larger. This implies that these parameters are more likely to have generated the data, assuming that the linear model is an accurate description of the data generating mechanism.
The likelihood can be implemented like this
using Distributions
function loglikelihood_linear(beta, gamma, sigma, x, y)
y_det = beta .* x .+ gamma
return sum(logpdf.(Normal.(y_det, sigma), y))
end
Then read you data so that you can manipulate it
using DataFrames
import CSV
linear_data = CSV.read(joinpath("..", "..", "data", "model_linear.csv"),
DataFrame)
#show(linear_data, allrows=true) if you want to see all your data
x = linear_data.x
y = linear_data.y
Compute
#Log likelihood with parameters β = 2, γ = 1, σ = 1
loglikelihood_linear(2,1,1, x, y)
## -27.55146249091466
#Log likelihood with parameters β = 3, γ = 2, σ = 1
loglikelihood_linear(3,2,1, x, y)
## -166.7271070528497
#parameters β = 2, γ = 1, σ = 1 are more likely to have produced to data
The following libraries and modules are required:
from models import model_monod, model_growth
import pandas as pd
import numpy as np
from scipy.stats import norm, multinomial
import matplotlib.pyplot as plt
import seaborn as sns
The likelihood can be implemented like this:
def loglikelihood_linear(x, y, par):
"""
Calculate the log-likelihood for a linear model.
Parameters:
x (array-like): Independent variable.
y (array-like): Dependent variable.
par (list or array-like): Parameters of the model [slope, intercept, standard deviation].
Returns:
float: The log-likelihood value.
"""
# Deterministic part
y_det = par[0] * x + par[1]
# Calculate log-likelihood
log_likelihood = np.sum(norm.logpdf(y, loc=y_det, scale=par[2]))
return log_likelihood
You can then compute the likelihood of some parameters, given the data, for instance like this:
# Loading data
# Specify the path to the CSV file
file_path = r"../../data/model_linear.csv"
# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")
# Extract x and y columns
x = dat['x']
y = dat['y']
# Parameters for testing
par_linear_test = [2, 1, 1]
# Compute likelihood for these parameters
result = loglikelihood_linear(x, y, par_linear_test)
print(f"Log-likelihood: {result}")
## Log-likelihood: -27.551462490914652
Calling the optim
function then looks like this:
## define starting parameters for optimisation:
par.init <- c(beta = 2,
gamma = 1,
sigma = 1)
## because `optim` is minimizing, we define the negative loglikelihood
neg.loglikelihood.linear <- function(x, y, par) -loglikelihood.linear(x, y, par)
## maximum likelihood estimation
MLEparameters <- optim(par = par.init, fn = neg.loglikelihood.linear, x=x, y=y)
## inspect the result:
MLEparameters$par
## beta gamma sigma
## 2.2534583 0.3842564 0.7990461
We can check that the result looks reasonable by plotting the resulting linear model together with the data:
# model output from x-input and estimated parameters:
y.estimate = MLEparameters$par[1]*x + MLEparameters$par[2]
plot(x, y)
lines(x, y.estimate, col=2)
Implementing the negative loglikelihood that we will minimize. Note,
we optimize for the log_sigma
to avoid any negative values
for sigma
.
function neg_loglikelihood_linear(θ::Vector)
beta, gamma, log_sigma = θ
# minus sign to be able to minimize after
- loglikelihood_linear(beta, gamma, exp(log_sigma), x, y)
end
Minimizing
using Optim
θinit = [2.0, 1, log(0.5)]
## 3-element Vector{Float64}:
## 2.0
## 1.0
## -0.6931471805599453
param = Optim.minimizer(optimize(neg_loglikelihood_linear, θinit));
param[3] = exp(param[3]) # back-transform to sigma
## 0.7991010329384892
param
## 3-element Vector{Float64}:
## 2.25342685673029
## 0.3843207189590402
## 0.7991010329384892
Plotting
using Plots
scatter(x, y,
labels=false,
xlabel = "x",
ylabel = "y");
plot!(x -> param[1]*x + param[2],
labels = false)
Calling the optim
function then looks like this:
from scipy.optimize import minimize
# Define the negative log-likelihood function
def neg_loglikelihood_linear(par, x, y):
return -loglikelihood_linear(x, y, par)
# Define starting parameters for optimization
par_init = [2, 1, 1] # [beta, gamma, sigma]
# Perform maximum likelihood estimation
result = minimize(neg_loglikelihood_linear, par_init, args=(x, y))
# Inspect the result
MLEparameters = result.x
print(f"MLE Parameters: {MLEparameters}")
## MLE Parameters: [2.25342197 0.38434878 0.79910834]
We can check that the result looks reasonable by plotting the resulting linear model together with the data:
# Model output from x-input and estimated parameters
y_estimate = MLEparameters[0] * x + MLEparameters[1]
# Plot the original data
plt.figure(figsize=(14, 6))
plt.scatter(x, y, label='Observed Data')
# Plot the estimated values
plt.plot(x, y_estimate, color='red', label='Estimated Model')
# Add labels and legend
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
The linear regression functions works differently (it uses a method called “QR decomposition”), but should give very similar results to our likelihood optimisation.
mod <- lm(y~x)
mod
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.3843 2.2534
Note, (intercept)
corresponds to \(\gamma\) and x
to \(\beta\).
using GLM
linear_mod = GLM.lm(@formula(y ~ x), linear_data)
## StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
##
## y ~ 1 + x
##
## Coefficients:
## ────────────────────────────────────────────────────────────────────────
## Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
## ────────────────────────────────────────────────────────────────────────
## (Intercept) 0.384349 0.353936 1.09 0.2911 -0.356447 1.12514
## x 2.25342 0.121103 18.61 <1e-12 1.99995 2.50689
## ────────────────────────────────────────────────────────────────────────
Note, (intercept)
corresponds to \(\gamma\) and x
to \(\beta\).
import statsmodels.api as sm
# Add a constant term (intercept) to the independent variable
x_with_const = sm.add_constant(x)
# Fit the linear model
model = sm.OLS(y, x_with_const).fit()
# Print the summary of the model
print(model.summary())
## OLS Regression Results
## ==============================================================================
## Dep. Variable: y R-squared: 0.948
## Model: OLS Adj. R-squared: 0.945
## Method: Least Squares F-statistic: 346.2
## Date: Wed, 18 Jun 2025 Prob (F-statistic): 1.18e-13
## Time: 11:37:10 Log-Likelihood: -25.088
## No. Observations: 21 AIC: 54.18
## Df Residuals: 19 BIC: 56.27
## Df Model: 1
## Covariance Type: nonrobust
## ==============================================================================
## coef std err t P>|t| [0.025 0.975]
## ------------------------------------------------------------------------------
## const 0.3843 0.354 1.086 0.291 -0.356 1.125
## x 2.2534 0.121 18.608 0.000 2.000 2.507
## ==============================================================================
## Omnibus: 1.962 Durbin-Watson: 1.783
## Prob(Omnibus): 0.375 Jarque-Bera (JB): 1.122
## Skew: 0.203 Prob(JB): 0.570
## Kurtosis: 1.943 Cond. No. 6.14
## ==============================================================================
##
## Notes:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Note, (intercept)
corresponds to \(\gamma\) and x
to \(\beta\).
Construct the likelihood function for the model “Monod”. Use the deterministic part of the model given in the introduction to the exercises and assume i.i.d. normal errors on the deterministic model output. The result will look very similar to the solution of Exercise 1.1!
Implement a function that returns the logarithm of the likelihood,
for given model parameters and measurement data. Read the data provided
in the file model_monod_stoch.dat
and use them as
“observation data” to compute the log-likelihood for the parameter sets
\(\{r_{max} = 5,\, K = 3, \, \sigma =
0.2\}\) and \(\{r_{max} = 10,\, K = 4,
\, \sigma = 0.2\}\). Which parameters are more likely to have
generated this data?
For the model “Monod” with additive normal i.i.d. noise, constructing the likelihood follows exactly the same procedure as in the linear case (Exercise 1.1), only with a simple change in the deterministic part of the model. So again, our complete probabilistic model is given by the deterministic part plus a Gaussian noise term,
\[ Y_i = y_{i,det} + Z_i , \quad \quad Z_i \sim \mathcal{N}(0,\sigma) \;, \]
but now the deterministic part is given by
\[ y_{det}(x,\boldsymbol{\theta}_{monod}) = r(C,r_{max}, K) = \frac{r_{max} C}{K+C} \;. \label{eqn:monodDet} \]
We therefore again have three model parameters in total: \(\boldsymbol{\theta} = \{ \boldsymbol{\theta}_{monod},\sigma\} = \{r_{max}, K,\sigma\}\).
The expression for the likelihood is then similar to that of the linear model:
\[ L(\boldsymbol{\theta}) = p(\mathbf{y}|\mathbf{C},r_{max}, K,\sigma) = \frac{1}{(2 \pi)^{n/2} \sigma^n } \exp\left( {-\frac{1}{2\sigma^2}} \sum_{i=1}^n (y_i - r(C_i,r_{max},K))^2 \right) \;. \]
The implementation then looks like this:
source("../../models/models.r") # load the deterministic model
loglikelihood.monod <- function(yobs, C, par){
## deterministic part:
y.det <- model.monod(par, C)
## Calculate loglikelihood:
return( sum(dnorm(yobs, mean=y.det, sd=par['sigma'], log=TRUE)) )
}
The parameters with higher likelihood are more likely to have generated the data. In our case, this should be the parameter set \(\{r_{\mathrm{max}} = 5,\,K = 3,\,\sigma=0.2\}\).
Plotting the data together with the model output given those parameters looks ok too:
## read data
dat <- read.table("../../data/model_monod_stoch.csv", header=TRUE)
C <- dat$C
r <- dat$r
## compare the two parameter vectors
par1 <- c(r_max = 5, K = 3, sigma=0.2)
par2 <- c(r_max = 10, K = 4, sigma=0.2)
loglikelihood.monod(r, C, par1)
## [1] -53.14229
loglikelihood.monod(r, C, par2)
## [1] -1346.384
## prediction
par <- c(r_max = 5, K = 3, sigma=0.2)
r.det <- model.monod(par, C)
## plot
plot(C, r)
lines(C, r.det, col=2)
Implementing the loglikelihood
using ComponentArrays
using Distributions
include("../../models/models.jl") # load the deterministic model
function loglikelihood_monod(yobs, C, par)
## deterministic part:
ydet = model_monod(C, par)
## Calculate loglikelihood:
return sum(logpdf.(Normal.(ydet, par.sigma), yobs))
end
Read the data
monod_data = CSV.read(joinpath("..", "..", "data", "model_monod_stoch.csv"),
DataFrame)
C = monod_data.C
yobs = monod_data.r
We can compute
## compare the two parameter vectors
par1 = ComponentVector(r_max = 5, K = 3, sigma=0.2)
## ComponentVector{Float64}(r_max = 5.0, K = 3.0, sigma = 0.2)
par2 = ComponentVector(r_max = 10, K = 4, sigma=0.2)
## ComponentVector{Float64}(r_max = 10.0, K = 4.0, sigma = 0.2)
loglikelihood_monod(yobs, C, par1)
## -53.142293490516785
loglikelihood_monod(yobs, C, par2)
## -1346.3842113894802
Plotting
# use the better parameters to compute the deterministic predictions
y_det = model_monod(C, par1)
scatter(C, yobs,
labels = false,
xlabel = "C",
ylabel = "r");
plot!(C, y_det,
labels = false)
The implementation then looks like this:
def loglikelihood_monod(yobs, C, par):
"""
Calculate log-likelihood of observations yobs given parameters par and substrate concentration C.
Arguments:
----------
- yobs: array-like
Observed data.
- C: array-like
Substrate concentration.
- par: array with the parameters
Parameters including (in this order):
- 'r_max': maximum growth rate
- 'K': half-saturation concentration
- 'sigma': standard deviation of the normal distribution for the likelihood
Returns:
-------
log_likelihood: float
The calculated log-likelihood of the observed data.
"""
# Deterministic part: calculate y.det using model_monod function
y_det = model_monod(par, C)
sigma = par[-1]
# Calculate log-likelihood: sum of log probabilities of yobs given y_det and sigma
log_likelihood = np.sum(norm.logpdf(yobs, loc=y_det, scale=sigma))
return log_likelihood
The parameters with higher likelihood are more likely to have generated the data. In our case, this should be the parameter set \(\{r_{\mathrm{max}} = 5,\,K = 3,\,\sigma=0.2\}\).
Plotting the data together with the model output given those parameters looks ok too:
# Loading data
# Specify the path to the CSV file
file_path = r"../../data/model_monod_stoch.csv"
# Load the CSV file into a pandas DataFrame
dat = pd.read_csv(file_path, sep=" ")
C = dat['C']
r = dat['r']
# Define the two parameter vectors
# 'r_max': 5, 'K': 3, 'sigma': 0.2
# 'r_max': 10, 'K': 4, 'sigma': 0.2
par1 = [5, 3, 0.2]
par2 = [10, 4, 0.2]
# Define the two parameter vectors
#par1 = {'r_max': 5, 'K': 3, 'sigma': 0.2}
#par2 = {'r_max': 10, 'K': 4, 'sigma': 0.2}
# Calculate the log-likelihood using the loglikelihood_monod function
loglikelihood_par1 = loglikelihood_monod(r, C, par1)
loglikelihood_par2 = loglikelihood_monod(r, C, par2)
# Output the log-likelihood results
print("Log-likelihood for par1:", loglikelihood_par1)
## Log-likelihood for par1: -53.14229349051678
print("Log-likelihood for par2:", loglikelihood_par2)
## Log-likelihood for par2: -1346.3842113894802
# Assume C and r are already imported and available
# par is a vector with parameters 'r_max', 'K' and 'sigma'
par = [5, 3, 0.2]
# Predict the deterministic growth rate using model_monod function
r_det = model_monod(par, C)
# Plot the observed data
plt.figure(figsize=(14, 6))
sns.scatterplot(x=C, y=r, label='Observed data')
# Plot the deterministic growth rate as a line
sns.lineplot(x=C, y=r_det, color='red', label='Deterministic model')
# Add labels and a legend
plt.xlabel('Substrate concentration (C)')
plt.ylabel('Growth rate (r)')
plt.legend()
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
Produce deterministic model outputs. Do this for the two example models:
Model “Monod” over a concentration (\(C\)) range from 0 to 10 with the default parameter values \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Model “Growth” over a time interval from 0 to 2 with default parameter values \(\mu = 4\), \(K = 10\), \(b = 1\), \(Y = 0.6\), \(C_{\mathrm{M,ini}} = 10\), and \(C_{\mathrm{S,ini}} = 50\).
Plot and interpret the results.
Both deterministic models are already implemented as
model.monod
and model.growth
in the file
models.R
.
Both deterministic models are already implemented as
model_monod
and model_growth
in the file
models.jl
.
Both deterministic models are already implemented as
model_monod
and model_growth
in the file
models.py
.
For the model “Monod”, write a function that produces model outputs (i.e. samples from the probabilistic model). In other words, we want to simulate new observation data including observation noise.
Make the assumptions that the noise is i.i.d normal with standard deviation \(\sigma = 0.2\). For the deterministic model use \(r_{\mathrm{max}} = 5\) and \(K = 3\).
Use your function to simulate several thousand probabilistic model realisations (hypothetical data sets), for fixed model parameters, and plot the 10% and 90% quantiles as continuous prediction bands.
For the computation of quantiles you can use the R function
quantile
.
For the computation of quantiles, you can use the function
quantile
.
For the computation of quantiles, you can use the function
np.quantile
.
source("../../models/models.r") # load the model
## parameters
C.monod <- seq(0, 10, 0.1) # concentrations at which model output should be calculated
par.monod <- c(r_max=5, K=3) # define parameter vector for the model "Monod"
## run model
r.det <- model.monod(par.monod, C.monod)
## plot result
plot(C.monod, r.det, col=2, type="l", ylab="r")
source("../../models/models.r") # load the model
## parameters
par.growth <- c(mu=4,
K=10,
b=1,
Y=0.6,
C_M_ini=10,
C_S_ini=50)
times <- seq(0, 2, length=101) # time resolution of solution
## run model
out <- model.growth(par.growth, times)
## plot results
plot(out$time, out$C_M, col=3, type="l", ylim=c(0, 51),
xlab="time", ylab="concentration")
lines(out$time, out$C_S, col=4)
legend("topright", legend=c("microorganisms", "substrate"),
col=c(3,4), lty=1)
using ComponentArrays
include("../../models/models.jl") # load the model
#set parameters
par = ComponentVector(r_max = 5, K = 3)
C_monod = 0:0.1:10
# run deterministic model monod
Y_monod = model_monod(C_monod, par)
Plotting
plot(C_monod, Y_monod,
labels = false,
xlabel = "C",
ylabel = "r")
using ComponentArrays
include("../../models/models.jl") # load the model
# set parameters
par = ComponentVector(mu=4, K=10, b=1, Y=0.6,
inits = (C_M=10.0, C_S=50.0))
times = 0:0.01:3
# run model
res = model_growth(times, par)
Plotting
plot(res.time, res.C_M,
label = "C_M",
xlabel = "time",
ylabel = "concentrations",
title = "Deterministic model growth");
plot!(res.time, res.C_S, label="C_S")
# Parameters
C_monod = np.arange(0, 10.1, 0.1) # concentrations at which model output should be calculated
par_monod = [5, 3] # define parameter vector for the model "Monod"
# Run model
r_det = model_monod(par_monod, C_monod)
# Plot result
plt.figure(figsize=(14, 6))
sns.lineplot(x=C_monod, y=r_det, color='red')
plt.ylabel('r')
plt.xlabel('C')
plt.title('Model Monod output')
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
# Parameters
# 'mu', 'K', 'b','Y','C_M_ini','C_S_ini'
par_growth = [4, 10, 1, 0.6, 10, 50]
times = np.linspace(0, 2, 101) # time resolution of solution
# Run model
out = model_growth(par_growth, times)
# Plot results
plt.figure(figsize=(14, 6))
sns.lineplot(x=out['time'], y=out['C_M'], color='green', label='microorganisms')
sns.lineplot(x=out['time'], y=out['C_S'], color='blue', label='substrate')
plt.ylim(0, 51)
## (0.0, 51.0)
plt.xlabel('time')
plt.ylabel('concentration')
plt.legend(loc='upper right')
plt.title('Growth of microorganisms and substrate over time')
plt.show()
The growth model takes the input variables \(\mathbf{C} = (C_1,...,C_{n})\) and returns the deterministic growth rate \(\mathbf{r}_{\mathrm{det}} = \mathbf{r}(\mathbf{C},r_{\mathrm{max}},K) = \{r(C_1,r_{\mathrm{max}},K),\dots,r(C_n,r_{\mathrm{max}},K)\}\). We simulate the measurement errors by generating \(n\) random normal values with mean \(0\) and standard deviation \(\sigma\) and add them to the deterministic model output.
source("../../models/models.r") # load the model
simulate.monod.stoch <- function(par, C){
## run model
r.det <- model.monod(par, C)
## generate noise
z <- rnorm(length(C), 0, par["sigma"])
return(r.det + z)
}
## parameters
C.monod <- seq(0, 10, 0.1) # concentrations at which model output should be calculated
par.monod <- c(r_max=5, K=3, sigma=0.2) # define parameter vector for the model "Monod"
n.rep <- 1000
## run `n.rep` simulations
r.sims <- replicate(n=n.rep, simulate.monod.stoch(par.monod, C.monod))
dim(r.sims)
## [1] 101 1000
## compute quantiles
quant <- apply(r.sims, MARGIN=1, FUN=quantile, probs=c(0.1, 0.9))
dim(quant)
## [1] 2 101
## plot result
plot(C.monod, r.det, type="n", ylab="r")
polygon(c(C.monod,rev(C.monod)), c(quant[1,],rev(quant[2,])), col = "grey85")
lines(C.monod, r.det, col=2, lwd=2, ylab="r")
using ComponentArrays
include("../../models/models.jl") # load the deterministic model
# function to simulate stochastic realisations
function simulate_monod_stoch(C, par)
Ydet = model_monod(C, par)
z = rand(Normal(0, par.sigma), length(Ydet)) # adding noise
Ydet .+ z
end
# parameters
par = ComponentVector(r_max = 5, K = 3, sigma=0.2)
C_monod = 0:0.1:10
n_sample = 1000
monod_stoch = hcat((simulate_monod_stoch(C_monod, par) for _ in 1:n_sample)...)
#compute quantile
low_quantile = [quantile(monod_stoch[i,:], 0.025) for i in 1:length(C_monod)]
med_quantile = [quantile(monod_stoch[i,:], 0.5) for i in 1:length(C_monod)]
upper_quantile = [quantile(monod_stoch[i,:], 0.975) for i in 1:length(C_monod)]
Plotting
plot(C_monod, upper_quantile,
fillrange = low_quantile,
labels = false,
xlabel = "C",
ylabel = "r");
plot!(C_monod, med_quantile,
labels = false)
def simulate_monod_stoch(par, C):
"""
Simulate the Monod model with stochastic noise.
Arguments:
----------
- par: Array containing the following parameters:
- r_max: maximum growth rate
- K: half-saturation concentration
- sigma: standard deviation of noise
- C: numpy array containing substrate concentrations
Value:
------
A numpy array representing the growth rate with stochastic noise added.
"""
# Run deterministic model
r_det = model_monod(par, C)
# Generate noise using a normal distribution with mean 0 and standard deviation `sigma`
sigma = par[-1]
z = np.random.normal(0, sigma, size=len(C))
# Add noise to the deterministic model results
return r_det + z
# Parameters
C_monod = np.arange(0, 10.1, 0.1) # concentrations at which model output should be calculated
par_monod = [5, 3, 0.2] # define parameter array for the model "Monod"
n_rep = 1000 # number of simulations
# Run `n_rep` simulations
r_sims = np.array([simulate_monod_stoch(par_monod, C_monod) for _ in range(n_rep)])
# The dimensions of `r_sims` array will be (n_rep, len(C_monod))
print(r_sims.shape) # Should print (1000, 101)
## (1000, 101)
# Compute quantiles
# r_sims is a numpy array where each row is a simulation and each column represents a concentration
quant = np.quantile(r_sims, q=[0.1, 0.9], axis=0)
# The resulting `quant` is a 2D array where the first row contains the 10th percentile,
# and the second row contains the 90th percentile for each concentration.
print(quant.shape) # Prints the shape of the resulting quantile array
## (2, 101)
# Define variables for plotting
C_monod = np.linspace(0, 10, 101) # Concentrations at which model output is calculated
# Plot result
plt.figure(figsize=(14, 6))
# Plot deterministic model output using seaborn
sns.lineplot(x=C_monod, y=r_det, color='red', label='Deterministic model', linewidth=2)
# Create a shaded region between the quantiles (0.1 and 0.9)
plt.fill_between(C_monod, quant[0, :], quant[1, :], color='gray', alpha=0.5, label='10th-90th percentile')
# Set labels for the x and y axes
plt.xlabel('C')
plt.ylabel('r')
# Add a legend
plt.legend()
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
Construct the likelihood for the model “Survival”. The mortality rate \(\lambda\), upon multiplication with an infinitesimal time interval \(\Delta\), denotes the probability to die within this time interval, given that one is still alive at the beginning of it. If \(S(t)\) denotes the probability of an individual to be still alive at time point \(t\), this reads as \[ \text{Prob}\left(\text{die in }[t, t+\Delta \right) \mid \text{survived until } t) = \frac{S(t)-S(t+\Delta)}{S(t)} = \lambda \Delta \,. \] After some rearrangement, we let \(\Delta \rightarrow 0\) and obtain the differential equation \[ \dot S(t)=-\lambda S(t)\,. \] Solve this equation to find the time-dependence of \(S(t)\) (Hint: try an exponential ansatz or simply wolframalpha.com). From this solution, derive the probability for an individual to die within time-interval \([t_{i-1},t_i]\). Now, consider \(N\) independent individuals, each with the same mortality rate \(\lambda\). Derive the likelihood function, for a vector of death counts \({\bf y}=(y_1,\dots,y_n)\), where \(y_i\) denotes the number of deaths occurring within time interval \([t_{i-1},t_i]\), for \(0=t_0<t_1<\dots<t_n\).
Hint: Look up the definition of the multinomial distribution!
Write a function that simulates output from this probabilistic model, for given parameter values. Use this function to simulate several thousand model realisations, for fixed model parameters (use \(N=30\) individuals with mortality rate \(\lambda = 0.2d^{-1}\) and \(5\) subsequent observation windows of one day each). Use a boxplot to visualize the results.
Implement a function that returns the logarithm of the likelihood, for given parameter values and measurement data. Check that your log-likelihood is implemented correctly by generating model outputs and computing the likelihood for several parameters, including the one that you generated the data with.
The probability for an individual with mortality rate \(\lambda\) to survive until time \(t\) is given by the exponential \[ S(t,\lambda)=e^{-\lambda t}\,. \]
Thus, the probability of the individual to die within time interval \([t_{i-1},t_i]\) is given by \[ p_{i,\lambda}=S(t_{i-1},\lambda) - S(t_i,\lambda)\,. \]
Assume that we have \(N\) individuals, each with the same mortality rate. We want to calculate the probability that \(y_1\) individuals die in time interval \([t_0,t_1]\), \(y_2\) individuals die in time interval \([t_2,t_3]\) etc. Those individuals who survive the whole experiment die in the time interval \([t_n,\infty]\), and this happens with probability \(p_{n+1}=S(t_n)\). For the time being, assume that we distinguish the individuals and want to calculate the probability that individual 1 dies in, say, time interval \([t_3,t_4]\), individual 2 in time interval, say, \([t_1,t_2]\) etc. in such a way that the counts \(\bf y\) are assumed. Due to the independence of the individuals, the probability, for such an event, is then simply given by the product \[ p_{1,\lambda}^{y_1} \cdot p_{2,\lambda}^{y_2} \cdot \ldots \cdot \, p_{n,\lambda}^{y_n}\,. \]
Since we do not distinguish individuals, we have to multiply this product by a so-called multinomial coefficient, which counts the number of ways to put \(N\) individuals into \(n+1\) buckets, with \(y_1\) ending up in the first bucket etc. Thus, the likelihood function, for model parameter \(\lambda\), given an output \(\bf y\) of our survival model is described by the so-called multinomial distribution \[ p({\bf y} \mid \lambda) = L_{\text{survival}}(\lambda) = \frac{N!}{y_1!\cdots y_{n+1}!} p_{1,\lambda}^{y_1} \cdot p_{2,\lambda}^{y_2} \cdot \ldots \cdot p_{n+1,\lambda}^{y_{n+1}}\,. \]
The simplest way to simulate a model output is to use the R function
rmultinom
. Alternatively, we can simulate the death time,
for each individual, drawing from the exponential distribution \[
f(t|\lambda)=\lambda e^{-\lambda t}\,,
\] which is the negative derivative of the survival probability
\(S(t)\). After that, we count how many
individuals die within the different time intervals.
## parameters for model "Survival":
N <- 30 # Number of individuals
t <- 1:5 # Times at which the numbers of deaths are counted
par.survival <- c(lambda=0.2) # Parameter vector of the model "Survival"
## Simulation from model "Survival".
## Here we sample from the multinomial distribution implemented in R:
model.survival <- function(N, t, par){
## Calculate survival probabilities at measurement points
## and add zero "at time-point infinity":
S <- exp(-par*c(0,t))
S <- c(S,0)
## Calcute probabilities to die within observation windows:
p <- -diff(S)
## Simulate from multinomial distribution:
y <- rmultinom(1, N, p)[-length(p)]
return(y)
}
## Alternatively, we generate the exponentially distributed
## times of death and bin them
model.survival.alt <- function(N, t, par) {
## Generate deaths times:
T <- rexp(N, par)
## Count number of deaths within observation windows:
t2 <- c(t,6) # construct the bins
y <- table(cut(T+1, breaks = t2)) # do the data binning
return(y)
}
## Generate n.sample simulations from the probabilistic model
n.sample <- 1000
pred.survival <- matrix(nrow=n.sample, ncol=length(t))
for (i in 1:n.sample){
pred.survival[i,] <- model.survival(N, t, par.survival)
}
## Make a boxplot:
boxplot(pred.survival, xlab="time", ylab="# death")
Create a function survival, whose arguments are:
N,
the number of individuals,
t
, a vector of the form 1:k
λ
, the mortality rate.
Returns a random vector y
of number of deaths per
bin.
function survival(N::Int64, t, λ)
# S is a vector whose first component is 1 and last is 0,
# because the probability to be alive at time t=0 is 1,
# and the probability to be alive at time t=infinity is 0
S = exp.(-λ .* t)
S = [1; S; 0]
p = -diff(S)
y = rand(Multinomial(N, p))[1:(end-1)]
return y
end
# setting parameter
N = 30
t = 1:5
λ = 0.2
n_sample = 1000
pred_survival = Matrix{Int}(undef, n_sample, length(t))
for i in 1:n_sample
pred_survival[i,:] = survival(N, t, λ)
end
Plotting
using StatsPlots
boxplot(pred_survival,
labels = false, xlab="time", ylab="# deaths")
Create a function survival, whose arguments are:
N,
the number of individuals,
t
, a vector of the form 1:k
λ
, the mortality rate.
Returns a random vector y
of number of deaths per
bin.
# Parameters for the survival model
N = 30 # Number of individuals
t = np.array([1, 2, 3, 4, 5]) # Times at which the numbers of deaths are counted
par_survival = [0.2] # Parameter vector of the model "Survival": 'lambda'
# Function for the survival model
def model_survival(N, t, par):
lambda_ = par[0]
# Calculate survival probabilities at measurement points and add zero "at time-point infinity"
S = np.exp(-lambda_ * np.concatenate(([0], t)))
S = np.append(S, 0) # Add zero "at time-point infinity"
# Calculate probabilities to die within observation windows
p = -np.diff(S)
# Simulate from multinomial distribution
y = multinomial.rvs(N, p)[:-1]
return y
# Generate simulations from the probabilistic model
n_sample = 1000
pred_survival = np.zeros((n_sample, len(t)))
for i in range(n_sample):
pred_survival[i, :] = model_survival(N, t, par_survival)
# Convert the NumPy array to a pandas DataFrame (as input to seaborn)
pred_survival_df = pd.DataFrame(pred_survival, columns=t)
# Plotting a boxplot
plt.figure(figsize=(8, 6))
sns.boxplot(data=pred_survival_df)
plt.xlabel("Time")
plt.ylabel("# Deaths")
plt.title("Boxplot of # Deaths at Different Times")
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
loglikelihood.survival <- function(y, N, t, par){
## Calculate survival probabilities at measurement points
## and add zero "at time-point infinity":
S <- exp(-par*c(0,t))
S <- c(S,0)
## Calcute probabilities to die within observation windows:
p <- -diff(S)
## Add number of survivors at the end of the experiment:
y <- c(y, N-sum(y))
## Calculate log-likelihood of multinomial distribution at point y:
LL <- dmultinom(y, prob=p, log=TRUE)
return(LL)
}
## simulate data
y.sim <- model.survival(N, t, par.survival)
## evaluation likelihood
loglikelihood.survival(y.sim, N, t, par.survival)
## [1] -8.791774
function loglikelihood_survival(N::Int, t, y, λ)
S = exp.(-λ .* t)
S = [1; S; 0]
p = -diff(S)
## Add number of survivors at the end of the experiment:
push!(y, N-sum(y))
logpdf(Multinomial(N, p), y)
end
# Generating and checking
N = 30
## 30
t = 1:5
## 1:5
λ = 0.2
## 0.2
y = survival(N, t, λ)
## 5-element Vector{Int64}:
## 5
## 5
## 2
## 3
## 5
loglikelihood_survival(N, t, y, λ)
## -9.048994121789905
def loglikelihood_survival(y, N, t, par):
"""
Calculate the log-likelihood of the observed data y given the parameters par
and the times t using the multinomial distribution.
Arguments:
- y: observed data (number of deaths at each time point).
- N: total number of individuals in the experiment.
- t: time points at which the number of deaths are counted.
- par: dictionary containing the lambda parameter for the model.
Returns:
- LL: log-likelihood value.
"""
lambda_ = par[0]
# Calculate survival probabilities at measurement points and add zero "at time-point infinity"
S = np.exp(-lambda_ * np.concatenate(([0], t)))
S = np.append(S, 0)
# Calculate probabilities to die within observation windows
p = -np.diff(S)
# Add number of survivors at the end of the experiment
y = np.append(y, N - np.sum(y))
# Calculate log-likelihood of multinomial distribution at point y
LL = multinomial.logpmf(y, N, p)
return LL
# Simulate data using the model_survival function
def model_survival(N, t, par):
lambda_ = par[0]
# Calculate survival probabilities at measurement points and add zero "at time-point infinity"
S = np.exp(-lambda_ * np.concatenate(([0], t)))
S = np.append(S, 0)
# Calculate probabilities to die within observation windows
p = -np.diff(S)
# Simulate from multinomial distribution
y = multinomial.rvs(N, p)[:-1]
return y
# Parameters for the survival model
N = 30 # Number of individuals
t = np.array([1, 2, 3, 4, 5]) # Times at which the numbers of deaths are counted
par_survival = [0.2] # Parameter vector of the model "Survival"
# Set seed for reproducibility
np.random.seed(1)
# Simulate data
y_sim = model_survival(N, t, par_survival)
# Evaluate log-likelihood
LL = loglikelihood_survival(y_sim, N, t, par_survival)
print(f"Log-likelihood: {LL}")
## Log-likelihood: -11.182319106189894
For the “Monod” model try and compare different sensitivity analysis approaches.
Increase the model parameters by \(10 \%\) and by \(50 \%\), redo the forward simulations, and try to understand the effect of the parameter change on the model results.
A simple metric that measures the sensitivity of model parameters \(\theta\) to model output \(Y\) is defined as \[ s_{loc} = \frac{\Delta Y}{\Delta \theta} \] Often a relative metric is easier to interpret: \[ s_{loc} = \frac{\theta}{Y} \frac{\Delta Y}{\Delta \theta} \]
Set the ranges \(\Delta \theta\) to \(10 \%\) and by \(50 \%\) of the parameter values and compute the values for different concentrations (inputs). What is your interpretation?
Conduct a variance-based regional (global) sensitivity analyses with uniform parameter distributions.
Use the fast99
function implemented in the package sensitivity
.
The template below should get you started. For details see also the
documentation of fast99
.
library(sensitivity)
## 1) get sampling points in parameter space
lower = ...
upper = ...
res.fast <- fast99(factors = names(par.monod),
n = 100,
q = "qunif",
q.arg = list(list(min=lower[1], max=upper[1]),
list(min=lower[2], max=upper[2]))
)
## all parameter combinations to run the model with
head(res.fast$X)
## 2) run the model for all parameter combinations:
res.SA <- matrix(NA, nrow=nrow(res.fast$X), ncol=...)
for(i in 1:nrow(res.fast$X)){
res.SA[i,] <- model(unlist(res.fast$X[i,]), ...)
}
## 3) compute sensitivity index
S <- tell(res.fast, y = res.SA.monod[, j])
S
plot(S)
Use the package GlobalSensitivity.jl
and the method eFAST
. Try to visualize the sensitivities
with a plot.
using GlobalSensitivity
using Distributions
# 1) Define a vector of tuples that give the lower and
# upper bound for each parameter:
# Define a `Distribution` for each parameter
param_dists = [Uniform(...), # r_max = par[1]
Uniform(...)] # K = par[2]
# 2) Run SA. `gsa` can handle models with multiple outputs (i.e. concentrations)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
eFAST(num_harmonics=6),
param_dists,
samples = 200)
res.S1 # direct effect
res.ST # total effect
Use the SALib
package. Try to visualize the sensitivity
with a plot. here there is one example of how to set up a function for
the sensititvity analysis.
def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
"""
Perform local sensitivity analysis on the model function.
Parameters:
- model: Function representing the model.
- par_vector: Array of model parameters.
- par: Name of the parameter to perform sensitivity analysis on.
- delta_rel: Relative change in the parameter value.
- *args, **kwargs: Additional arguments for the model function.
Returns:
Dictionary containing absolute and relative sensitivity indices.
"""
# Copy the parameter array to adjust the parameter of interest
par_vector2 = par_vector.copy()
# Adjust the specified parameter by the relative delta
par_vector2[par] *= (1 + delta_rel)
# Run the model with the original and adjusted parameter array
Y = model(par_vector, *args, **kwargs)
Y2 = model(par_vector2, *args, **kwargs)
# Compute sensitivity indices
S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
S_rel = (par_vector[par] / Y) * S_abs
# Return sensitivity indices as a dictionary
return {'S_abs': S_abs, 'S_rel': S_rel}
Because we have to do the same calculations for every parameters, it
is a good idea to write a little function that computes the absolute an
relative local sensitivity indices. Note, that we use the three dots
(...
) to pass additional arguments to the
model
.
local.SA <- function(model, par.vector, par, delta.rel=0.1, ...){
## change `par` by delta.rel
par.vector2 <- par.vector
par.vector2[par] <- par.vector2[par] * (1 + delta.rel)
## run model
Y <- model(par.vector, ...)
Y2 <- model(par.vector2, ...)
## compute sensitiviy indices
S.abs <- (Y - Y2) / (par.vector[par] - par.vector2[par])
S.rel <- par.vector[par] / Y * S.abs
list(S.abs = S.abs, S.rel=S.rel)
}
We apply our function local_SA
for both parameters with
different delta.rel
:
source("../../models/models.r") # load the model
## define inputs
C.monod <- seq(0, 10, 0.1) # concentrations at which model output should be calculated
## define parameters, the reference point
par.monod <- c(r_max=5, K=3)
## compute sensitiviy indices
S.rmax.10 <- local.SA(model.monod, par.monod, "r_max", delta.rel=0.1, C=C.monod)
S.rmax.50 <- local.SA(model.monod, par.monod, "r_max", delta.rel=0.5, C=C.monod)
S.K.10 <- local.SA(model.monod, par.monod, "K", delta.rel=0.1, C=C.monod)
S.K.50 <- local.SA(model.monod, par.monod, "K", delta.rel=0.5, C=C.monod)
Plotting the absolute indices:
plot(C.monod, S.K.10$S.abs, col=2, type="l", ylim=c(-0.5, 1),
ylab="absolute sensitivity",
main="absolute sensitivities")
lines(C.monod, S.K.50$S.abs, col=2, lty = 2)
lines(C.monod, S.rmax.10$S.abs, col=4, lty = 1)
lines(C.monod, S.rmax.50$S.abs, col=4, lty = 2)
legend("topleft", col=c(2,2,4,4), lty=c(1,2),
legend=c("S_K 10%", "S_K 50%", "S_r.max 10%", "S_r.max 50%"))
Plotting the and relative indices:
plot(C.monod, S.K.10$S.rel, col=2, type="l", ylim=c(-1, 1),
ylab="relative sensitivity",
main="relative sensitivities")
lines(C.monod, S.K.50$S.rel, col=2, lty = 2)
lines(C.monod, S.rmax.10$S.rel, col=4, lty = 1)
lines(C.monod, S.rmax.50$S.rel, col=4, lty = 2)
legend("bottomright", col=c(2,2,4,4), lty=c(1,2),
legend=c("S_K 10%", "S_K 50%", "S_r.max 10%", "S_r.max 50%"))
We use the function fast99
to compute 2*n
parameter combinations for which we will evaluate the model:
library(sensitivity)
## sensitivity sampling points
lower = par.monod * 0.7
upper = par.monod * 1.3
res.fast <- fast99(factors = names(par.monod),
n = 100,
q = "qunif",
q.arg = list(list(min=lower[1], max=upper[1]),
list(min=lower[2], max=upper[2]))
)
## all parameter combinations to run the model with
head(res.fast$X)
## r_max K
## 1 5.00 3.000
## 2 5.72 3.036
## 3 6.44 3.072
## 4 5.84 3.108
## 5 5.12 3.144
## 6 4.40 3.180
We then run the model with all combinations in a loop. For slow models, this could be done in parallel.
## calculate results for sample for all concentrations:
res.SA.monod <- matrix(NA, nrow=nrow(res.fast$X), ncol=length(C.monod))
for(i in 1:nrow(res.fast$X)){
res.SA.monod[i,] <- model.monod(unlist(res.fast$X[i,]), C.monod)
}
After picking a concentration, we can then compute the sensitivity indices
j <- 5 # chose a concentration
C.monod[j]
## [1] 0.4
S <- tell(res.fast, y = res.SA.monod[, j])
S
##
## Call:
## fast99(factors = names(par.monod), n = 100, q = "qunif", q.arg = list(list(min = lower[1], max = upper[1]), list(min = lower[2], max = upper[2])))
##
## Model runs: 200
##
## Estimations of the indices:
## first order total order
## r_max 0.5386887 0.5539790
## K 0.4437621 0.4589948
plot(S)
Because we have to do the same calculations for every parameters, it is a good idea to write a little function that computes the absolute an relative local sensitivity indices:
function local_SA(model::Function, θ::ComponentVector,
parameter::Symbol; Δrel=0.1)
# change `parameter` by `Δrel`:
θ2 = copy(θ*1.0)
θ2[parameter] = θ2[parameter] * (1 + Δrel)
# run model
Y = model(θ)
Y2 = model(θ2)
# compute sensitiviy indices
S_abs = (Y .- Y2) ./ (θ[parameter] - θ2[parameter])
S_rel = θ[parameter] ./ Y .* S_abs
(S_abs = S_abs, S_rel = S_rel)
end
## local_SA (generic function with 1 method)
We apply our function local_SA
for both parameters. Note
the use of an anonymous functions to pass the additional parameters
C_monod
. This is a pattern commonly used in Julia.
include("../../models/models.jl"); # load the model
## model_growth
# set concentrations
C_monod = 0:0.1:10;
# compute indices
S_rmax_10 = local_SA(θ -> model_monod(C_monod, θ), par, :r_max, Δrel=0.1);
S_rmax_50 = local_SA(θ -> model_monod(C_monod, θ), par, :r_max, Δrel=0.5);
S_K_10 = local_SA(θ -> model_monod(C_monod, θ), par, :K, Δrel=0.1);
S_K_50 = local_SA(θ -> model_monod(C_monod, θ), par, :K, Δrel=0.5);
Plotting the results
# plot absolute sensitivities
plot(C_monod, [S_rmax_10.S_abs S_rmax_50.S_abs S_K_10.S_abs S_K_50.S_abs],
labels = ["S_rmax_10" "S_rmax_50" "S_K_10" "S_K_50"],
xlab = "concentration", ylab="absolute sensitiviy")
# plot absolute sensitivities
plot(C_monod, [S_rmax_10.S_rel S_rmax_50.S_rel S_K_10.S_rel S_K_50.S_rel],
labels = ["S_rmax_10" "S_rmax_50" "S_K_10" "S_K_50"],
xlab = "concentration", ylab="relative sensitiviy")
using GlobalSensitivity
using Distributions
# Define a distribution for each paraemter
param_dists = [Uniform(0.7*par[1], 1.3*par[1]),
Uniform(0.7*par[2], 1.3*par[2])]
## 2-element Vector{Uniform{Float64}}:
## Uniform{Float64}(a=3.5, b=6.5)
## Uniform{Float64}(a=2.0999999999999996, b=3.9000000000000004)
# We can run multible outputs (i.e. different concentrations at once)
res = gsa(θ -> model_monod(C_monod, ComponentVector(r_max = θ[1], K = θ[2])),
eFAST(num_harmonics=6),
param_dists,
samples = 200)
## GlobalSensitivity.eFASTResult{Matrix{Float64}}([NaN NaN; 0.4911116923546162 0.49328757009808516; … ; 0.9465700186771833 0.05135616317856734; 0.9473401174137372 0.050607176694682135], [NaN NaN; 0.50523026854676 0.5077289452235891; … ; 0.9485992866581445 0.05280455417864782; 0.9493486132310703 0.052035735620521995])
# res.S1 # direct effects
# res.ST # total effects
p = plot(C_monod, res.ST[:,1], label="r_max, total effect",
xlab="concentration", ylab="sensitivity index");
plot!(p, C_monod, res.S1[:,1], label="r_max, main effect");
plot!(p, C_monod, res.ST[:,2], label="K, total effect");
plot!(p, C_monod, res.S1[:,2], label="K, main effect");
p
Because we have to do the same calculations for every parameters, it is a good idea to write a little function that computes the absolute an relative local sensitivity indices.
from SALib.sample import fast_sampler
from SALib.analyze import fast
def local_sa(model, par_vector, par, delta_rel=0.1, *args, **kwargs):
"""
Perform local sensitivity analysis on the model function.
Parameters:
- model: Function representing the model.
- par_vector: Array of model parameters.
- par: Name of the parameter to perform sensitivity analysis on.
- delta_rel: Relative change in the parameter value.
- *args, **kwargs: Additional arguments for the model function.
Returns:
Dictionary containing absolute and relative sensitivity indices.
"""
# Copy the parameter array to adjust the parameter of interest
par_vector2 = par_vector.copy()
# Adjust the specified parameter by the relative delta
par_vector2[par] *= (1 + delta_rel)
# Run the model with the original and adjusted parameter array
Y = model(par_vector, *args, **kwargs)
Y2 = model(par_vector2, *args, **kwargs)
# Compute sensitivity indices
S_abs = (Y2 - Y) / (par_vector2[par] - par_vector[par])
S_rel = (par_vector[par] / Y) * S_abs
# Return sensitivity indices as a dictionary
return {'S_abs': S_abs, 'S_rel': S_rel}
We apply our function local_SA
for both parameters with
different delta.rel
:
# Define inputs: concentrations at which model output should be calculated
C_monod = np.arange(0, 10.1, 0.1)
# Define parameters, the reference point
par_monod = [5, 3]
# Compute sensitivity indices using local sensitivity analysis
S_rmax_10 = local_sa(model_monod, par_monod, 0, delta_rel=0.1, C=C_monod)
## <string>:28: RuntimeWarning: divide by zero encountered in divide
## <string>:28: RuntimeWarning: invalid value encountered in multiply
S_rmax_50 = local_sa(model_monod, par_monod, 0, delta_rel=0.5, C=C_monod)
S_K_10 = local_sa(model_monod, par_monod, 1, delta_rel=0.1, C=C_monod)
S_K_50 = local_sa(model_monod, par_monod, 1, delta_rel=0.5, C=C_monod)
Plotting the absolute indices:
# Create a Seaborn style figure
sns.set(style="whitegrid")
# Create a figure
plt.figure(figsize=(10, 6))
# Plot S.K.10 absolute sensitivity
sns.lineplot(x=C_monod, y=S_K_10['S_abs'], color='red', linestyle='-', label='S_K 10%')
# Plot S.K.50 absolute sensitivity
sns.lineplot(x=C_monod, y=S_K_50['S_abs'], color='red', linestyle='--', label='S_K 50%')
# Plot S.rmax.10 absolute sensitivity
sns.lineplot(x=C_monod, y=S_rmax_10['S_abs'], color='blue', linestyle='-', label='S_r.max 10%')
# Plot S.rmax.50 absolute sensitivity
sns.lineplot(x=C_monod, y=S_rmax_50['S_abs'], color='blue', linestyle='--', label='S_r.max 50%')
# Set labels and title
plt.ylabel('Absolute Sensitivity')
plt.title('Absolute Sensitivities')
# Add a legend
plt.legend(loc='upper left')
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
Plotting the and relative indices:
# Create a Seaborn style figure
sns.set(style="whitegrid")
# Create a figure
plt.figure(figsize=(10, 6))
# Plot S.K.10 relative sensitivity
sns.lineplot(x=C_monod, y=S_K_10['S_rel'], color='red', linestyle='-', label='S_K 10%')
# Plot S.K.50 relative sensitivity
sns.lineplot(x=C_monod, y=S_K_50['S_rel'], color='red', linestyle='--', label='S_K 50%')
# Plot S.rmax.10 relative sensitivity
sns.lineplot(x=C_monod, y=S_rmax_10['S_rel'], color='blue', linestyle='-', label='S_r.max 10%')
# Plot S.rmax.50 relative sensitivity
sns.lineplot(x=C_monod, y=S_rmax_50['S_rel'], color='blue', linestyle='--', label='S_r.max 50%')
# Set limits, labels, and title
plt.ylim([-1, 1])
## (-1.0, 1.0)
plt.ylabel('Relative Sensitivity')
plt.title('Relative Sensitivities')
# Add a legend
plt.legend(loc='lower right')
# Display the plot
plt.show()
We use the function fast99
to compute 2*n
parameter combinations for which we will evaluate the model:
# Define par_monod array
par_monod = [5, 3] # Example values, adjust as needed
# Calculate lower and upper bounds
lower = np.array(list(par_monod)) * 0.7
upper = np.array(list(par_monod)) * 1.3
# Define the problem dictionary
problem = {
'num_vars': len(par_monod),
'names': list(["r_max", "K"]),
'bounds': list(zip(lower, upper))
}
# Perform sensitivity sampling using FAST99
res_fast = fast_sampler.sample(problem, N=100)
# Display the first 5 rows of the parameter combinations
print(res_fast[:5])
## [[5.55403157 3.33241894]
## [6.27403157 3.36841894]
## [6.00596843 3.40441894]
## [5.28596843 3.44041894]
## [4.56596843 3.47641894]]
We then run the model with all combinations in a loop. For slow models, this could be done in parallel.
# Assume model_monod function is already defined
# Assume res_fast and C_monod are also defined as per the earlier code
# Initialize an array to hold the results
res_SA_monod = np.empty((res_fast.shape[0], len(C_monod)))
# Loop through each sample
for i in range(res_fast.shape[0]):
# Get the parameters for the current sample
parameters = [
res_fast[i, 0],
res_fast[i, 1]
]
# Calculate the model_monod results for the given parameters and concentrations
res_SA_monod[i, :] = model_monod(parameters, C_monod)
After picking a concentration, we can then compute the sensitivity indices
# Choose a concentration
j = 4
selected_concentration = C_monod[j]
print(selected_concentration)
## 0.4
# Assume problem, res_fast, and res_SA_monod are defined as per earlier code
# Assume j is an integer index representing the concentration to analyze
# Perform sensitivity analysis for the specified concentration
# We expect `res_SA_monod[:, j]` to be a one-dimensional array of outputs
# Ensure that `j` is within the range of columns in `res_SA_monod`
S = fast.analyze(problem, Y=res_SA_monod[:, j], print_to_console=True)
## S1 ST S1_conf ST_conf
## r_max 0.539225 0.554401 0.136408 0.131977
## K 0.444932 0.460929 0.138345 0.134694
# Assume S is the sensitivity analysis result obtained from fast.analyze()
# Create a pandas DataFrame to hold the sensitivity indices and their names
data = {
'Factor': S['names'],
'First-Order': S['S1'],
'Total-Order': S['ST']
}
df = pd.DataFrame(data)
# Melt the DataFrame to long format for easier plotting with Seaborn
df_long = df.melt(id_vars='Factor', var_name='Sensitivity Type', value_name='Index')
# Create a barplot with Seaborn
sns.barplot(data=df_long, x='Factor', y='Index', hue='Sensitivity Type')
# Add labels and title
plt.ylabel('Sensitivity Index')
plt.title('Sensitivity Analysis Results')
# Show the plot
plt.show()