Correlating samples from Gaussian distributions

Let’s assume that we want generate samples from a multivariate normal distribution.

First we assume we know how to generate samples from a univariate gaussian distribution.

So we have n independent random variables $x_1, x_2,…, x_d$ where $x_i \sim \mathcal{N(0,1)}$

$\langle \vec{x} \rangle =0$ and $\langle \vec{x} \vec{x}^T \rangle = \mathbb{1}$.

and what we aim to find is a linear mapping $M: \vec{x} \mapsto \vec{y}$.

Where the new variable $y$ has a covariance matrix $C$, where $C=\langle \vec{y} \vec{y}^T\rangle$ which we can specify as we wish, remember though that a covariance matrix has to be positive definite.

So we want to find a transformation matrix $M$ where $M \vec{x} = \vec{y}$.

$C = \langle \vec{y} \vec{y}^T\rangle = \langle M \vec{x} (M \vec{x})^\intercal\rangle = \langle M \vec{x} \vec{x}^\intercal M^\intercal \rangle = M \langle \vec{x} \vec{x}^\intercal \rangle M^\intercal = M M^\intercal $.

Now let’s remember that the covariance matrix $C$ has to be a symmetric(hermitian) and positive definite matrix. For those kind of matrices we can always decompose it into the product of a lower triangular matrix $L$ and its transpose $L^\intercal$ [https://en.wikipedia.org/wiki/Cholesky_decomposition].

$ C = L \ L^T$

This is called the Cholesky decomposition.

So $L=M$ and we simply have to find the Cholesky decomposition, $L$, of our desired covariance matrix $C$ and apply it to our univariate gaussian samples.

We can easily generate samples from a univariate normal distribution:

import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns


d = 2
n = 100000 

# first we create our samples from the 2 independent normal distributions
x = np.random.normal(loc=0, scale=1, size=d*n).reshape(d, n)



# now we specify our covariance matrix
K_0 = np.array([[2, 1],
                [1, 2]])
# we calculate the Cholesky deomposition - L is the lower diagonal matrix
L = np.linalg.cholesky(K_0) 
y = np.dot(L, x)

# now let's see how samples from our new distribution looks like
sns.jointplot(x=x[0],
              y=x[1], 
              kind="kde", 
              space=0)


# we can compare our generated samples to the samples we get from the numpy function multivariate_normal
mean = [0, 0]
z = np.random.multivariate_normal(mean,cov=K_0, size=n)
y = np.transpose(z)

sns.jointplot(x=y[0],
              y=y[1], 
              kind="kde", 
              space=0);

Here we see the distribution from our code using Cholesky decomposition:

distribution from our code using Cholesky decomposition

Here we see the distribution from the numpy function multivariate_normal

distribution from the numpy function multivariate_normal

And we see that they are identical so this seems to work

##

Written on March 13, 2020