Note, for the first three exercises you need only pencil and paper.
The joint discrete probability table of \(P_{A,B}(a,b)\) is given below:
B.1 | B.2 | B.3 | |
---|---|---|---|
A.1 | 0.2 | 0.1 | 0.3 |
A.2 | 0.1 | 0.1 | 0.2 |
Derive the following probabilities:
Assume the probability densities \(p(E \mid B)\), \(p(B)\), \(p(A, D \mid E)\), and \(p(C \mid B, E)\) are known.
Assume that: \[ \mu \sim f_{\mu}(m) = \begin{cases} 0.1\exp(-0.1m) & m \geq 0 \\ 0 & \text{else} \end{cases} \] and \[ X \sim f_{X|\mu=m}(x \mid m) = \frac{1}{\sqrt{2\pi}} \exp\left(- \frac{(x-m)^2}{2}\right) \]
This means \(X\) is normally distributed with mean \(\mu\), and \(\mu\) itself is exponentially distributed.
Derive and interpret:
It is not the aim to find closed forms for the integrals.
joint distribution: \[ f_{X, \mu}(x, m) = f_{X|\mu}(x| m) \, f_{\mu}(m) = \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-m)^2}{2}} 0.1e^{-0.1m} \]
compound distribution \[ f_{X}(x) = \int_0^\infty f_{X, \mu}(x, m) \,dm \]
probability that \(\mu\) is larger than 5: \[ P(\mu>5) = \int_5^\infty f_{\mu}(m) \, dm = \exp (-0.1 \cdot 5) \]
The joint distribution under the constraint that \(\mu>5\): \[ f_{X,\mu|\mu>5}(x, m)= \begin{cases} f_{X, \mu}(x, m) / P(\mu>5)& \text{if}\ m>5\\ 0 & \text{otherwise} \end{cases} \]
marginal of \(X\) under constrain that \(\mu>5\): \[ f_{X|\mu>5}(x) = \int_5^\infty f_{X,\mu|\mu>5}(x, m) \, dm \neq f_{X}(x) \]
Take two random variables \(X\) and \(Y\) with the following distributions: \[ X \sim \text{Uniform}(0, 1)\\ Y \sim \text{Normal}(2, 10) \]
Another random variable is defined as a function of \(X\) as follows: \[ Z = \sin(2\pi X) \sqrt{X} \] While it is difficult to derive the probability density \(f_Z(z)\) analytically, sampling from it is easy.
Most important univariate probability distributions are already
implemented in R. Type ?Distributions
to get an overview.
For every distribution __
four functions are defined with
the following naming scheme:
d__(x, ...) # evaluate pdf at x
p__(x, ...) # evaluate cdf at x
q__(p, ...) # evaluate the p-th quantile
r__(n, ...) # sample n random numbers
For example, for the normal distribution the functions are called
dnorm()
, pnorm()
, qnorm()
, and
rnorm()
.
Histograms are generated with the function hist
. You can
adjust the number of bins with the argument breaks
,
e.g. hist(rnorm(10000), breaks=100)
.
The package Distributions.jl
provides access to many
probability distributions, see the documentation.
Here are some useful functions you can apply to all distributions
(not only Normal
):
Ï€ = Normal(0, 10) # define distribution
pdf(Ï€, x) # evaluate pdf at x
logpdf(Ï€, x) # evaluate log pdf at x
cdf(Ï€, x) # evaluate cdf at x
quantile(Ï€, p) # evaluate the p-th quantile
rand(Ï€, n) # sample n random numbers
Histograms are generated with the function histogram
from the package Plots.jl
. You can adjust the number of
bins with the argument nbins
, e.g
histogram(rand(Uniform(0,5), 10000), nbins = 100)
.
This package scipy
provides access to many probability
distributions. The packages matplotlib
and
seaborn
offer good ways to plot the graphs.
Here are some examples of how to call some functions:
uniform.pdf(x, loc, scale) # Evaluate uniform pdf at x
norm.pdf(x, loc, scale) # Evaluate normal pdf at x
dunif(0.8, 0, 1)
## [1] 1
dnorm(0.8, 2, 10)
## [1] 0.03960802
xs <- runif(10000, 0, 1)
ys <- rnorm(10000, 2, 10)
par(mfrow=c(1,2)) # optional, arranges both plots together
hist(xs)
hist(ys)
xs <- runif(10000, 0, 1) # sample from X
zs <- sin(2*pi*xs) * sqrt(xs) # transform samples
hist(zs, breaks=75)
using Distributions
X = Uniform(0,1)
Y = Normal(2, 10)
Use the function pdf
(Probability Density Function)
pdf_X = pdf(X, 0.8)
## 1.0
pdf_Y = pdf(Y, 0.8)
## 0.03960802117936561
X_s = rand(X, 10000)
Y_s = rand(Y, 10000)
using Plots
p = [
histogram(X_s, nbins = 20,
labels=false,
xlabel="X sample",
ylabel="Frequency",
title="Histogram of X_s"),
histogram(Y_s, nbins = 20,
labels=false,
xlabel="Y sample",
ylabel="Frequency",
title="Histogram of Y_s")
];
plot(p...)
Z_s = sin.(2*Ï€*X_s).*sqrt.(X_s) #the dot symbol means that the operation is done component by component
histogram(Z_s, nbins = 75,
labels=false,
xlabel="Z sample",
ylabel="Frequency",
title="Histogram of Z_s")
The following libraries are required:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import uniform, norm
import math
import numpy as np
# Uniform distribution (X ∼ Uniform(0,1))
pdf_uniform = uniform.pdf(0.8, loc=0, scale=1) # loc and scale for the range [0,1]
# Normal distribution (Y ∼ Normal(2,10))
pdf_normal = norm.pdf(0.8, loc=2, scale=10) # loc=2 is the mean, scale=10 is the standard deviation
# Print the probability densities
print(f"fX(0.8) = {pdf_uniform}")
## fX(0.8) = 1.0
print(f"fY(0.8) = {pdf_normal}")
## fY(0.8) = 0.03960802117936561
# Generate 10,000 samples from X ∼ Uniform(0,1)
xs = uniform.rvs(loc=0, scale=1, size=10000)
# Generate 10,000 samples from Y ∼ Normal(2,10)
ys = norm.rvs(loc=2, scale=10, size=10000)
# Visualize the histograms of the samples
plt.figure(figsize=(14, 6))
# Histogram for Uniform distribution
plt.subplot(1, 2, 1)
sns.histplot(xs, kde=False, bins=30, color='blue', alpha=0.6)
plt.title('Histogram of xs')
plt.xlabel('xs')
plt.ylabel('Frequency')
# Histogram for Normal distribution
plt.subplot(1, 2, 2)
sns.histplot(ys, kde=False, bins=30, color='red', alpha=0.6)
plt.title('Histogram of ys')
plt.xlabel('ys')
plt.ylabel('Frequency')
# Display the histograms
plt.tight_layout()
plt.show()
# Generate 10,000 samples from X ∼ Uniform(0,1)
xs = uniform.rvs(loc=0, scale=1, size=10000)
# Define the transformation function for Z
def transform_X_to_Z(X):
return math.sin(2 * math.pi * X) * math.sqrt(X)
# Apply the transformation function to the samples from X to generate samples for Z
zs = [transform_X_to_Z(x) for x in xs]
# Visualize the histograms of the samples
plt.figure(figsize=(14, 6))
sns.histplot(zs, kde=False, bins=75, color='green', alpha=0.6)
plt.title('Histogram of zs')
plt.xlabel('zs')
plt.ylabel('Frequency')
# Display the histograms
plt.tight_layout()
plt.show()
Generate 1000 samples of two different 2-dimensional normal distributions. The mean of both distribution is \(\boldsymbol{\mu}=(3,8)\).
For the first distribution, \(\boldsymbol{Y}_\mathrm{obs,indep}\), both dimensions are independent and have a standard deviations of \(\boldsymbol{\sigma}_\mathrm{obs,indep}=(2,5)\).
The second distribution, \(\boldsymbol{Y}_\mathrm{obs,dep}\), the two dimensions are correlated. This is defined with the covariance matrix \[ \boldsymbol{\Sigma}_\mathrm{obs,dep}= \begin{pmatrix} 4 & 8\\ 8 & 25 \end{pmatrix} \].
The aim is obtain two matrices(with two columns and 1000 rows) containing the random samples.
You can use rnorm()
to sample form a one-dimensional
normal distribution. With cbind()
you can combine vectors
to a matrix. For the depended normal distribution use
rmvnorm()
from the package mvtnorm
.
Given a mean vector \(\mu\) and a
correlation matrix \(\Sigma\), you can
construct a multivariate normal distribution with
MvNormal(μ, Σ)
from the package
Distribution.jl
. How does the covariance matrix for
independent variables look like?
You can use numpy.random.normal()
and
numpy.random.multivariate_normal()
to sample the required
distributions. Additionally, you can use np.column_stack
to
combine vectors to a matrix.
library(mvtnorm)
y.obs.indep <- cbind(rnorm(1000, mean=3, sd=2),
rnorm(1000, mean=8, sd=5))
Sigma.dep <- matrix(c(4,8,8,25), nrow=2, ncol=2)
y.obs.dep <- rmvnorm(1000, mean=c(3,8), sigma=Sigma.dep)
using Distributions
μs = [3, 8]
Σ_indep = [2^2 0; 0 5^2]
Σ_dep = [4 8; 8 25]
Y_1 = MvNormal(μs, Σ_indep)
y_1_sample = rand(Y_1, 1000)
Y_2 = MvNormal(μs, Σ_dep)
y_2_sample = rand(Y_2, 1000)
from numpy.random import normal, multivariate_normal
# Generate 1000 samples from the independent 2-dimensional normal distribution
y_obs_indep = np.column_stack([
normal(loc=3, scale=2, size=1000),
normal(loc=8, scale=5, size=1000)
])
# Define the covariance matrix for the dependent 2-dimensional normal distribution
sigma_dep = np.array([
[4, 8],
[8, 25]
])
# Generate 1000 samples from the dependent 2-dimensional normal distribution
y_obs_dep = multivariate_normal(mean=[3, 8], cov=sigma_dep, size=1000)
# Print the samples
print(y_obs_dep)
## [[ 2.62139691 5.55823272]
## [ 2.58461922 9.8608951 ]
## [ 6.25447771 16.01117172]
## ...
## [ 4.66336834 7.92106748]
## [-0.30834595 0.3235412 ]
## [ 5.85587315 14.55102434]]
print(y_obs_indep)
## [[ 5.0067899 4.80257306]
## [ 1.1739046 10.51268942]
## [ 5.78557828 10.35711131]
## ...
## [ 0.49380557 8.51252522]
## [ 1.71069324 17.19001855]
## [ 2.03179415 2.48584711]]
Perform some preliminary analysis of the data generated in Task 5:
Which of the above steps reveal a potential correlation structure in your data?
Try to arrange multiple plots in the same window by setting
par(mfrow=c(<nrow>,<ncol>))
. Use
quantile()
to calculate the interquantile range; use
hist()
and plot(density())
to visualize the
data; use cov()
,cor()
for the covariance and
the correlation, respectively.
You can use the function quantile
from the Statistics.jl
module. It provides also functions to compute the covariance and
correlation matrix.
The package Plots.jl
provides basic plotting function such as scatter
and
histogram
. For density plots and other more specialized
plots look at StatsPlots.jl
.
You can to arrange multiple plots in the same window by setting
fig, axs = plt.subplots(nrows, ncols)
and then calling
plt.subplot(nrows, ncols, i)
. Use
np.percentile()
to calculate the percentiles; use
seaborn
and matplotlib
to visualize the data;
use np.cov()
,np.corrcoef()
for the covariance
and the correlation, respectively.
quartiles.indep <- cbind(quantile(y.obs.indep[,1], probs = c(0.25,0.75)),
quantile(y.obs.indep[,2], probs = c(0.25,0.75)))
# Or with `apply`:
quartiles.indep <- apply(y.obs.indep, MARGIN=2, FUN=quantile, probs=c(0.25,0.75))
# then we use apply
quartiles.dep <- apply(y.obs.dep, MARGIN=2, FUN=quantile, probs=c(0.25,0.75))
quant5_95.indep <- apply(y.obs.indep, MARGIN=2, FUN=quantile, probs=c(0.05,0.95))
quant5_95.dep <- apply(y.obs.dep, MARGIN=2, FUN=quantile, probs=c(0.05,0.95))
cat("Interquartile range\nIndep: ", diff(quartiles.indep),
"\nDep: ", diff(quartiles.dep), "\n")
## Interquartile range
## Indep: 2.637886 6.767833
## Dep: 2.678544 6.626577
par(mfrow=c(2,2))
hist(y.obs.indep[,1], main="y1 of y.indep")
hist(y.obs.indep[,2], main="y2 of y.indep")
hist(y.obs.dep[,1], main="y1 of y.dep")
hist(y.obs.dep[,2], main="y2 of y.dep")
par(mfrow=c(2,2))
plot(density(y.obs.indep[,1]), main="y1 of y.indep")
plot(density(y.obs.indep[,2]), main="y2 of y.indep")
plot(density(y.obs.dep[,1]), main="y1 of y.dep")
plot(density(y.obs.dep[,2]), main="y2 of y.dep")
plot(y.obs.indep, main="y.obs.indep")
plot(y.obs.dep, main="y.obs.dep")
covariance.indep <- cov(y.obs.indep)
covariance.dep <- cov(y.obs.dep)
correlation.indep <- cor(y.obs.indep)
correlation.dep <- cor(y.obs.dep)
using Statistics
using StatsBase
interquartiles_indep = [iqr(y_1_sample[1, :]),
iqr(y_1_sample[2, :])]
## 2-element Vector{Float64}:
## 2.571162277898227
## 6.956472629475935
interquartiles_dep = [iqr(y_2_sample[1, :]),
iqr(y_2_sample[2, :])]
## 2-element Vector{Float64}:
## 2.7495032457564967
## 6.6715088843748145
inter_5_95_quantile_indep = [diff(quantile(y_1_sample[1, :], [0.05, 0.95])),
diff(quantile(y_1_sample[2, :], [0.05, 0.95]))]
## 2-element Vector{Vector{Float64}}:
## [6.504044085960378]
## [16.221351987285335]
inter_5_95_quantile_dep = [diff(quantile(y_2_sample[1, :], [0.05, 0.95])),
diff(quantile(y_2_sample[2, :], [0.05, 0.95]))]
## 2-element Vector{Vector{Float64}}:
## [6.470824530683799]
## [16.21167790353646]
hist = [
histogram(y_1_sample[1, :], nbins = 15,
xlabel="y_1_sample[1,:]"),
histogram(y_1_sample[2, :], nbins = 15,
xlabel="y_1_sample[2,:]"),
histogram(y_2_sample[1, :], nbins = 15,
xlabel="y_2_sample[1,:]"),
histogram(y_2_sample[2, :], nbins = 15,
xlabel="y_2_sample[2,:]")
]
plot(hist..., legend = false, ylabel = "frequency")
using StatsPlots
dens = [
density(y_1_sample[1,:],
xlabel="y_1_sample[1,:]"),
density(y_1_sample[2,:],
xlabel="y_1_sample[2,:]"),
density(y_2_sample[1,:],
xlabel="y_2_sample[1,:]"),
density(y_2_sample[2,:],
xlabel="y_2_sample[2,:]")
]
plot(dens..., legend = false, ylabel = "density")
Alternatively, we could also have written a loop for the four plots.
scatter(y_1_sample[1, :], y_1_sample[2, :],
labels=false,
xlabel="y_1_sample 1",
ylabel="y_1_sample 2",
title="y_1_sample")
scatter(y_2_sample[1, :], y_2_sample[2, :],
labels=false,
xlabel="y_2_sample 1",
ylabel="y_2_sample 2",
title="y_2_sample")
using Statistics
Statistics.cov(y_1_sample, dims=2)
## 2×2 Matrix{Float64}:
## 4.10456 0.285337
## 0.285337 26.1503
Statistics.cov(y_2_sample, dims=2)
## 2×2 Matrix{Float64}:
## 3.96535 7.82915
## 7.82915 24.3871
Statistics.cor(y_1_sample, dims=2)
## 2×2 Matrix{Float64}:
## 1.0 0.0275415
## 0.0275415 1.0
Statistics.cor(y_2_sample, dims=2)
## 2×2 Matrix{Float64}:
## 1.0 0.796149
## 0.796149 1.0
# Calculate the interquartile ranges and 90%-interquantile ranges of the samples
quartiles_indep = np.percentile(y_obs_indep, [25, 75], axis=0)
quartiles_dep = np.percentile(y_obs_dep, [25, 75], axis=0)
quant5_95_indep = np.percentile(y_obs_indep, [5, 95], axis=0)
quant5_95_dep = np.percentile(y_obs_dep, [5, 95], axis=0)
print(f"Interquartile range\nIndep: {np.diff(quartiles_indep, axis=0)[0]}")
## Interquartile range
## Indep: [2.62401887 6.29310802]
print(f"Dep: {np.diff(quartiles_dep, axis=0)[0]}")
## Dep: [2.51924248 6.84747748]
# Plot and compare the histograms of all marginals
fig, axs = plt.subplots(2, 2, figsize=(14, 6))
# Subplot 1
plt.subplot(2, 2, 1)
sns.histplot(y_obs_indep[:, 0], kde=False, bins=30, color='blue', alpha=0.6)
plt.title('y1 of y.indep')
plt.xlabel('y_obs_indep[:, 0]')
plt.ylabel('Frequency')
# Subplot 2
plt.subplot(2, 2, 2)
sns.histplot(y_obs_indep[:, 1], kde=False, bins=30, color='blue', alpha=0.6)
plt.title('y2 of y.indep')
plt.xlabel('y_obs_indep[:, 1]')
plt.ylabel('Frequency')
# Subplot 3
plt.subplot(2, 2, 3)
sns.histplot(y_obs_dep[:, 0], kde=False, bins=30, color='blue', alpha=0.6)
plt.title('y1 of y.dep')
plt.xlabel('y_obs_dep[:, 0]')
plt.ylabel('Frequency')
# Subplot 4
plt.subplot(2, 2, 4)
sns.histplot(y_obs_dep[:, 1], kde=False, bins=30, color='blue', alpha=0.6)
plt.title('y2 of y.dep')
plt.xlabel('y_obs_dep[:, 1]')
plt.ylabel('Frequency')
# Display the histograms
plt.tight_layout()
plt.show()
# Plot and compare the densities of all marginals
fig, axs = plt.subplots(2, 2, figsize=(14, 6))
# Density plot for y1 of y.indep
plt.subplot(2, 2, 1)
sns.kdeplot(y_obs_indep[:, 0], color='blue', alpha=0.6)
plt.title('Density of y1 of y.indep')
plt.xlabel('y_obs_indep[:, 0]')
plt.ylabel('Density')
# Density plot for y2 of y.indep
plt.subplot(2, 2, 2)
sns.kdeplot(y_obs_indep[:, 1], color='blue', alpha=0.6)
plt.title('Density of y2 of y.indep')
plt.xlabel('y_obs_indep[:, 1]')
plt.ylabel('Density')
# Density plot for y1 of y.dep
plt.subplot(2, 2, 3)
sns.kdeplot(y_obs_dep[:, 0], color='blue', alpha=0.6)
plt.title('Density of y1 of y.dep')
plt.xlabel('y_obs_dep[:, 0]')
plt.ylabel('Density')
# Density plot for y2 of y.dep
plt.subplot(2, 2, 4)
sns.kdeplot(y_obs_dep[:, 1], color='blue', alpha=0.6)
plt.title('Density of y2 of y.dep')
plt.xlabel('y_obs_dep[:, 1]')
plt.ylabel('Density')
# Adjust layout and display the plots
plt.tight_layout()
plt.show()
# Compare the scatterplots of y_obs_indep and y_obs_dep
fig, axs = plt.subplots(1, 2, figsize=(14, 6))
# Scatterplot for y_obs_indep
sns.scatterplot(x=y_obs_indep[:, 0], y=y_obs_indep[:, 1], alpha=0.7, ax=axs[0])
axs[0].set_title("y.obs.indep")
axs[0].set_xlabel('y_obs_indep[:, 0]')
axs[0].set_ylabel('y_obs_indep[:, 1]')
# Scatterplot for y_obs_dep
sns.scatterplot(x=y_obs_dep[:, 0], y=y_obs_dep[:, 1], alpha=0.7, ax=axs[1])
axs[1].set_title("y.obs.dep")
axs[1].set_xlabel('y_obs_dep[:, 0]')
axs[1].set_ylabel('y_obs_dep[:, 1]')
# Display the scatterplots
plt.tight_layout()
plt.show()
# Compute the covariance and correlation matrices of y.obs.indep and y.obs.dep
covariance_indep = np.cov(y_obs_indep, rowvar=False)
covariance_dep = np.cov(y_obs_dep, rowvar=False)
correlation_indep = np.corrcoef(y_obs_indep, rowvar=False)
correlation_dep = np.corrcoef(y_obs_dep, rowvar=False)
print("Covariance matrix of y.obs.indep:", covariance_indep)
## Covariance matrix of y.obs.indep: [[ 3.97555151 -0.22370194]
## [-0.22370194 24.86845039]]
print("\nCovariance matrix of y.obs.dep:", covariance_dep)
##
## Covariance matrix of y.obs.dep: [[ 3.80530384 7.68472437]
## [ 7.68472437 24.58583812]]
print("\nCorrelation matrix of y.obs.indep:", correlation_indep)
##
## Correlation matrix of y.obs.indep: [[ 1. -0.02249814]
## [-0.02249814 1. ]]
print("\nCorrelation matrix of y.obs.dep:", correlation_dep)
##
## Correlation matrix of y.obs.dep: [[1. 0.79449491]
## [0.79449491 1. ]]
Real data often contain columns of different data types (e.g. numbers and strings). Dataframes are designed to work with this kind of data conveniently.
Import the file ./data/model_growth.csv
as
dataframe. Perform some analyses similar to the ones in Task 6.
Add a new column (say C_new
) to the growth data
frame that contains some random values .
Read the data using
read.table("</path/to/somefile.txt>", header=TRUE)
to
indicate that the first row are the column names (use file
../data/model_growth.csv
). To select the column
C_M
from a dataframe, say data
, you can use
data$C_M
or data[,"C_M"]
.
To add columns, you can use cbind
to column-bind the new
data to the available matrix and convert everything to a
data.frame
. Then, you can rename the columns by assigning
the desired names with function colnames()
applied to the
newly created data.frame
.
Use the package DataFrames.jl
to read and manipulate
data. To import csv (comma separated values) files, we also use
CSV.jl
. You can use pwd()
the find your
currently working directory.
using DataFrames
import CSV
path = joinpath("data", "myfile.csv")
data = DataFrame(CSV.File(path; delim=";"))
# select columns "x"
data.x
Use the pandas
library to work with data frames. Read
the data using pd.read_csv(file_path, sep=" ")
. To select a
specific column, user can simply type:
dataframe['columnname']
## Loading data
## ----------------------------------------------
growth.dat <- read.table("../../data/model_growth.csv", header=TRUE)
C_M <- growth.dat$C_M
C_S <- growth.dat$C_S
## Plotting data
## ----------------------------------------------
plot(growth.dat$t, growth.dat$C_S, type="b", xlab="Time", ylab="Concentration")
lines(growth.dat$t, growth.dat$C_M, type="b", pch=17, col=2)
legend("topright", legend=c("Substrate", "Microorganisms"), col=c(1,2), pch=c(1,17), lty=1)
## Combining Data
## ----------------------------------------------
## Data for new row
y <- rnorm(nrow(growth.dat))
## Add the row to the dataframe
growth.dat.2 <- cbind(growth.dat, C_new = y)
head(growth.dat.2)
## t C_M C_S C_new
## 1 0.0 13.27717 48.10293 -0.4242892
## 2 0.1 13.19659 45.11676 -0.7412177
## 3 0.2 15.62860 36.28523 -0.7845806
## 4 0.3 19.44555 32.06841 0.4977962
## 5 0.4 20.05754 19.68459 -0.6675241
## 6 0.5 23.17631 10.07305 1.6641594
using CSV
using DataFrames
pwd()
Given your working directory, go to the desired file with a relative
path. Remember that ..
allows to come one step back.
growth_dat = CSV.read(joinpath("..", "..", "data", "model_growth.csv"),
DataFrame)
# show(growth_dat, allrows=true) if you want to see all you data
Extraction of a column can be done with the following syntax.
C_M = growth_dat[:, "C_M"]
C_S = growth_dat[:, "C_S"]
time = growth_dat[:, "t"]
## or alternatively
time = growth_dat.t
Plotting
plot(time, C_M,
labels="Microorganisms",
xlabel="Time",
ylabel="Concentration");
plot!(time, C_S,
labels="Substrat")
b.
# generate some data
C_new = rand(nrow(growth_dat))
# add new column
growth_dat.Cnew = C_new
import pandas as pd
## Loading data
# Specify the path to the CSV file
file_path = r"../../data/model_growth.csv"
# Load the CSV file into a pandas DataFrame
growth_dat = pd.read_csv(file_path, sep=" ")
# Access the 'C_M' and 'C_S' columns from the DataFrame
t = growth_dat['t']
C_M = growth_dat['C_M']
C_S = growth_dat['C_S']
## Plotting data
# Set the figure size (width, height) in inches
plt.figure(figsize=(14, 6))
# Create a line plot for 'C_S' against 't'
sns.lineplot(x=t, y=C_S, marker='o', label='Substrate')
# Create a line plot for 'C_M' against 't'
sns.lineplot(x=t, y=C_M, marker='^', color='red', label='Microorganisms')
# Add x-axis and y-axis labels
plt.xlabel('Time')
plt.ylabel('Concentration')
# Add a legend to the plot
plt.legend(loc='upper right')
# Optimize the plot space
plt.tight_layout()
# Display the plot
plt.show()
# Number of rows in the growth_dat dataframe
n = len(growth_dat)
# Generate an array of normally distributed random numbers with the same number of rows as growth_dat
y = np.random.normal(size=n)
# Add the new row to the dataframe as a new column named 'C_new'
growth_dat['C_new'] = y
# Display the first few rows of the updated dataframe
print(growth_dat.head())
## t C_M C_S C_new
## 0 0.0 13.277165 48.102929 -0.083110
## 1 0.1 13.196591 45.116760 1.228254
## 2 0.2 15.628600 36.285227 -0.627723
## 3 0.3 19.445545 32.068411 0.307233
## 4 0.4 20.057542 19.684587 0.583476
If \(f()\) is non-linear, generally, \[ f(E[X]) \neq E[f(X)] \] where \(E\) is the expected value and \(X\) is a random variable. Implement the non-linear function \(f(x) = \sin\sqrt{x}\) and assume that \(X\) is log-normally distribute. Use samples to estimate \(f(E[X])\), \(E[f(X)]\), \(\mathrm{Var}[X]\), \(\mathrm{Var}[f(X)]\) and compare them.
Use rlnorm
to sample from a log-normal distribution,
which accepts the mean and the standard deviation on the log-scale, not
on the original scale. Additionally, keep in mind that in R a general
function can be defined as:
function.name <- function(arg1,arg2){
result <- arg1 + arg2 # or any other operation
return(result)
}
Most basic functions are already available, and those include both
sin
and sqrt
. Try ?sin
in the R
console to get access to the manual of the harmonic functions. These can
be used anywhere in the code, including inside a custom function.
Use LogNormal()
from the package
Distributions.jl
for a log-normal distribution. You can
create you own Julia functions following the syntax
function choose_your_name(arg_1, arg_2)
res = arg_1 + arg_2
return res
end
Very short functions can be defined as
foo(a, b) = a + b
In python, one can define a function as follows:
def fun_square(number):
"""
This part is optional and is preferable used to describe what the function
does to the users.
This function returns the square of the given number.
Arguments:
----------
- number: number to be processed.
Returns:
-------
- result: the squared given number.
"""
result = number * number
return result
# Example usage
num = 4
result = fun_square(num)
print(results) # The output will be 16
fun <- function(x){
return(sin(sqrt(x)))
}
X <- rlnorm(1000, meanlog=0, sdlog=1)
## mean of X: 1.532428
## mean of f(X): 0.7507829
## f(mean of X): 0.945104
## variance of X: 3.879901
## variance of f(X): 0.05379731
plot(X,fun(X))
hist(X)
hist(fun(X))
f(x) = sin(sqrt(x))
x = rand(LogNormal(), 1000)
using Statistics
# mean of X
mean(x)
## 1.6425246789152488
# mean of f(X)
mean(f.(x))
## 0.75761771528133
# f(mean of X)
f(mean(x))
## 0.9584762841019947
# variance of x
var(x)
## 3.8578568723913795
# variance of f(X)
var(f.(x))
## 0.05356226759233008
Plotting
using Plots
scatter(x, f.(x),
labels = false,
xlabel = "X",
ylabel = "f(X)")
histogram(x, nbins=20,
labels = false,
xlabel = "X",
ylabel = "Frequency",
title = "Histogram of X")
histogram(f.(x), nbins=20,
labels = false,
xlabel = "f(X)",
ylabel = "Frequency",
title = "Histogram of f(X)")
# Define the function in Python
def fun(x):
return np.sin(np.sqrt(x))
# Generate 1000 samples from a log-normal distribution with meanlog=0 and sdlog=1
X = np.random.lognormal(mean=0, sigma=1, size=1000)
# Calculate the mean of X
mean_X = np.mean(X)
# Calculate the mean of f(X)
mean_fX = np.mean(fun(X))
# Calculate the function value of the mean of X
f_mean_X = fun(mean_X)
# Calculate the variance of X
variance_X = np.var(X)
# Calculate the variance of f(X)
variance_fX = np.var(fun(X))
# Print the results
print("Mean of X:", mean_X)
## Mean of X: 1.6110755522977038
print("Mean of f(X):", mean_fX)
## Mean of f(X): 0.7533362151460489
print("f(mean of X):", f_mean_X)
## f(mean of X): 0.9548877343121427
print("Variance of X:", variance_X)
## Variance of X: 4.759319081961184
print("Variance of f(X):", variance_fX)
## Variance of f(X): 0.05099708082022862
d.Plot the results
# Plot X against fun(X) using seaborn
plt.figure(figsize=(14, 6))
sns.scatterplot(x=X, y=fun(X), alpha=0.5, color='blue')
plt.title('Scatter Plot of X vs. fun(X)')
plt.xlabel('X')
plt.ylabel('fun(X)')
plt.grid(True)
plt.show()
# Create a figure with two subplots
plt.figure(figsize=(14, 6))
# Plot the histogram of X
plt.subplot(1, 2, 1)
sns.histplot(X, bins=30, color='blue', alpha=0.6)
plt.title('Histogram of X')
plt.xlabel('X')
plt.ylabel('Frequency')
# Plot the histogram of fun(X)
plt.subplot(1, 2, 2)
sns.histplot(fun(X), bins=30, color='red', alpha=0.6)
plt.title('Histogram of fun(X)')
plt.xlabel('fun(X)')
plt.ylabel('Frequency')
# Optimize the plot space
plt.tight_layout()
# Display the plots
plt.show()