Note, for the first three exercises you need only pencil and paper.
The joint discrete probability table of \(P_{A,B}(a,b)\) is given below:
B.1 | B.2 | B.3 | |
---|---|---|---|
A.1 | 0.2 | 0.1 | 0.3 |
A.2 | 0.1 | 0.1 | 0.2 |
Derive the following probabilities:
Assume the probability densities \(p(E \mid B)\), \(p(B)\), \(p(A, D \mid E)\), and \(p(C \mid B, E)\) are known.
Assume that: \[ \mu \sim f_{\mu}(m) = \begin{cases} 0.1\exp(-0.1m) & m \geq 0 \\ 0 & \text{else} \end{cases} \] and \[ X \sim f_{X|\mu=m}(x \mid m) = \frac{1}{\sqrt{2\pi}} \exp\left(- \frac{(x-m)^2}{2}\right) \]
This means \(X\) is normally distributed with mean \(\mu\), and \(\mu\) itself is exponentially distributed.
Derive and interpret:
It is not the aim to find closed forms for the integrals.
Take two random variables \(X\) and \(Y\) with the following distributions: \[ X \sim \text{Uniform}(0, 1)\\ Y \sim \text{Normal}(2, 10) \]
Another random variable is defined as a function of \(X\) as follows: \[ Z = \sin(2\pi X) \sqrt{X} \] While it is difficult to derive the probability density \(f_Z(z)\) analytically, sampling from it is easy.
Most important univariate probability distributions are already
implemented in R. Type ?Distributions
to get an overview.
For every distribution __
four functions are defined with
the following naming scheme:
d__(x, ...) # evaluate pdf at x
p__(x, ...) # evaluate cdf at x
q__(p, ...) # evaluate the p-th quantile
r__(n, ...) # sample n random numbers
For example, for the normal distribution the functions are called
dnorm()
, pnorm()
, qnorm()
, and
rnorm()
.
Histograms are generated with the function hist
. You can
adjust the number of bins with the argument breaks
,
e.g. hist(rnorm(10000), breaks=100)
.
The package Distributions.jl
provides access to many
probability distributions, see the documentation.
Here are some useful functions you can apply to all distributions
(not only Normal
):
π = Normal(0, 10) # define distribution
pdf(π, x) # evaluate pdf at x
logpdf(π, x) # evaluate log pdf at x
cdf(π, x) # evaluate cdf at x
quantile(π, p) # evaluate the p-th quantile
rand(π, n) # sample n random numbers
Histograms are generated with the function histogram
from the package Plots.jl
. You can adjust the number of
bins with the argument nbins
, e.g
histogram(rand(Uniform(0,5), 10000), nbins = 100)
.
This package scipy
provides access to many probability
distributions. The packages matplotlib
and
seaborn
offer good ways to plot the graphs.
Here are some examples of how to call some functions:
uniform.pdf(x, loc, scale) # Evaluate uniform pdf at x
norm.pdf(x, loc, scale) # Evaluate normal pdf at x
Generate 1000 samples of two different 2-dimensional normal distributions. The mean of both distribution is \(\boldsymbol{\mu}=(3,8)\).
For the first distribution, \(\boldsymbol{Y}_\mathrm{obs,indep}\), both dimensions are independent and have a standard deviations of \(\boldsymbol{\sigma}_\mathrm{obs,indep}=(2,5)\).
The second distribution, \(\boldsymbol{Y}_\mathrm{obs,dep}\), the two dimensions are correlated. This is defined with the covariance matrix \[ \boldsymbol{\Sigma}_\mathrm{obs,dep}= \begin{pmatrix} 4 & 8\\ 8 & 25 \end{pmatrix} \].
The aim is obtain two matrices(with two columns and 1000 rows) containing the random samples.
You can use rnorm()
to sample form a one-dimensional
normal distribution. With cbind()
you can combine vectors
to a matrix. For the depended normal distribution use
rmvnorm()
from the package mvtnorm
.
Given a mean vector \(\mu\) and a
correlation matrix \(\Sigma\), you can
construct a multivariate normal distribution with
MvNormal(μ, Σ)
from the package
Distribution.jl
. How does the covariance matrix for
independent variables look like?
You can use numpy.random.normal()
and
numpy.random.multivariate_normal()
to sample the required
distributions. Additionally, you can use np.column_stack
to
combine vectors to a matrix.
Perform some preliminary analysis of the data generated in Task 5:
Which of the above steps reveal a potential correlation structure in your data?
Try to arrange multiple plots in the same window by setting
par(mfrow=c(<nrow>,<ncol>))
. Use
quantile()
to calculate the interquantile range; use
hist()
and plot(density())
to visualize the
data; use cov()
,cor()
for the covariance and
the correlation, respectively.
You can use the function quantile
from the Statistics.jl
module. It provides also functions to compute the covariance and
correlation matrix.
The package Plots.jl
provides basic plotting function such as scatter
and
histogram
. For density plots and other more specialized
plots look at StatsPlots.jl
.
You can to arrange multiple plots in the same window by setting
fig, axs = plt.subplots(nrows, ncols)
and then calling
plt.subplot(nrows, ncols, i)
. Use
np.percentile()
to calculate the percentiles; use
seaborn
and matplotlib
to visualize the data;
use np.cov()
,np.corrcoef()
for the covariance
and the correlation, respectively.
Real data often contain columns of different data types (e.g. numbers and strings). Dataframes are designed to work with this kind of data conveniently.
Import the file ./data/model_growth.csv
as
dataframe. Perform some analyses similar to the ones in Task 6.
Add a new column (say C_new
) to the growth data
frame that contains some random values .
Read the data using
read.table("</path/to/somefile.txt>", header=TRUE)
to
indicate that the first row are the column names (use file
../data/model_growth.csv
). To select the column
C_M
from a dataframe, say data
, you can use
data$C_M
or data[,"C_M"]
.
To add columns, you can use cbind
to column-bind the new
data to the available matrix and convert everything to a
data.frame
. Then, you can rename the columns by assigning
the desired names with function colnames()
applied to the
newly created data.frame
.
Use the package DataFrames.jl
to read and manipulate
data. To import csv (comma separated values) files, we also use
CSV.jl
. You can use pwd()
the find your
currently working directory.
using DataFrames
import CSV
path = joinpath("data", "myfile.csv")
data = DataFrame(CSV.File(path; delim=";"))
# select columns "x"
data.x
Use the pandas
library to work with data frames. Read
the data using pd.read_csv(file_path, sep=" ")
. To select a
specific column, user can simply type:
dataframe['columnname']
If \(f()\) is non-linear, generally, \[ f(E[X]) \neq E[f(X)] \] where \(E\) is the expected value and \(X\) is a random variable. Implement the non-linear function \(f(x) = \sin\sqrt{x}\) and assume that \(X\) is log-normally distribute. Use samples to estimate \(f(E[X])\), \(E[f(X)]\), \(\mathrm{Var}[X]\), \(\mathrm{Var}[f(X)]\) and compare them.
Use rlnorm
to sample from a log-normal distribution,
which accepts the mean and the standard deviation on the log-scale, not
on the original scale. Additionally, keep in mind that in R a general
function can be defined as:
function.name <- function(arg1,arg2){
result <- arg1 + arg2 # or any other operation
return(result)
}
Most basic functions are already available, and those include both
sin
and sqrt
. Try ?sin
in the R
console to get access to the manual of the harmonic functions. These can
be used anywhere in the code, including inside a custom function.
Use LogNormal()
from the package
Distributions.jl
for a log-normal distribution. You can
create you own Julia functions following the syntax
function choose_your_name(arg_1, arg_2)
res = arg_1 + arg_2
return res
end
Very short functions can be defined as
foo(a, b) = a + b
In python, one can define a function as follows:
def fun_square(number):
"""
This part is optional and is preferable used to describe what the function
does to the users.
This function returns the square of the given number.
Arguments:
----------
- number: number to be processed.
Returns:
-------
- result: the squared given number.
"""
result = number * number
return result
# Example usage
num = 4
result = fun_square(num)
print(results) # The output will be 16