Independence, joint/marginal/conditional probability, covariance and correlation

Statistical independence

Two random variablesXXandYYare said to be independent when their joint probability (see below) is equal to the product of the probabilities of each:

P(X,Y)=P(X)P(Y) .P(X, Y) = P(X) P(Y) \ .

This means, in terms of conditional probabilities,

P(XY)=P(X,Y)P(Y)=P(X)P(Y)P(Y)=P(X) ,P(X | Y) = \frac{P(X, Y)}{P(Y)} = \frac{P(X)P(Y)}{P(Y)} = P(X) \ ,

that is, the probability ofXXoccurring is not affected by the occurrence ofYY. This is typically how independence is defined, in verbal terms: the occurrence of one event does not influence the occurrence of the other.

IID variables

I.I.D. stands for independent and identically distributed, it's a shortening used all over in statistics. IID variables are independent but also distributed in the same way.

The concept is the basic assumptions of many foundational results in statistics.

The joint probability

The joint probability of one or more events is the probability that they happen together. If XX,YY,ZZ, ... are the random variables, their joint probability is written as

P(X,Y,Z,)P(X, Y, Z, \ldots)

or as

P(XYZ)P(X \cap Y \cap Z \ldots)

The case of independent variables

If the variables are independent, their joint probability reduces to the product of their probabilities: P(X1,X2,,Xn)=Πi=1nP(Xi)P(X1, X_2, \ldots, X_n) = \Pi{i=1}^n P(X_i).

The marginal probability

Picture: Wikipedia, Bscan, CC0, via Wikimedia Commons

If we have the joint probability of two or more random variables, the marginal probability of each is the probability related to that variable and to its own space of events; it expresses the probability of the variable when the value of the other one is not known. It calculated by summing the joint probability over the space of events of the other variable. More specifically, givenP(X,Y)=P(X=x,Y=y)P(X, Y) = P(X=x, Y=y),

P(X=x)=yP(X=x,Y=y) .P(X=x) = \sum_y P(X=x, Y=y) \ .

The illustration here (Image by IkamusumeFan (own work, released under CC BY-SA 3.0), via Wikimedia Commons) shows points extracted from a joint probability (the black dots) and the marginal probabilities as well.

Covariance and correlation


Given the random variablesXXandYYwith respective meansμx\mu_xand μy\mu_y, their covariance is defined as

cov(X,Y)=E[(Xμx)((Yμy)]\text{cov}(X, Y) = \mathbb{E}[(X - \mu_x)((Y - \mu_y)]

It is a measure of how jointly the two variables vary: a positive covariance means that whenXXgrows,YYgrows as well and a negative covariance means that whenXXgrows,YYdecreases.


The word correlation is measured by a correlation coefficient which exists in several definitions depending on what is exactly measured; it is always a sort of normalised covariance. The correspondent of the covariance itself is Pearson's definition, which defines the correlation coefficient as the covariance normalised by the product of the standard deviations of the two variables:

ρxy=cov(x,y)σxσy=E[(xμx)(yμy)]σxσy ,\rho_{xy} = \frac{\text{cov}(x, y)}{\sigma_x \sigma_y} = \frac{\mathbb{E}[(x - \mu_x)(y - \mu_y)]}{\sigma_x \sigma_y} \ ,

and it can also be written as

The correlation coefficient has these properties:

  • 1ρxy1-1 \leq \rho_{xy} \leq 1

  • It is symmetric: ρxy=ρyx\rho_{xy} = \rho_{yx}

  • If the variables are independent, then ρxy=0\rho_{xy} = 0 (but the reverse is not true)

Independence and correlation

Let's expand on the last point there really. We said that if two random variables are independent, then the correlation coefficient is zero. This is easy to prove as it follows directly from the definition above (also bear in mind Fubini's theorem:

E[XY]=ΩXΩYdxdy xyP(x,y)=ΩXΩYdxdy xyP(x)P(y)=μxμy .\mathbb{E}[XY] = \int_{\Omega_X } \int_{\Omega_Y} \text{d} x \text{d} y \ xy P(x,y) = \int_{\Omega_X } \int_{\Omega_Y} \text{d} x \text{d} y \ xy P(x) P(y) = \mu_x \mu_y \ .

The reverse is not true. Look at this amazing Q&A on Cross Validated for a well explained counter-example.

Correlation and the relation between variables

Correlation says "how much" it happens that whenxxgrows,yygrows as well. It is not a measure of the slope of the linear relation betweenxxandyy. This is greatly illustrated in the figure above (from Wikipedia's page, under CC0 ), which reports sets of data points withxxandyyand their correlation coefficient.

In the central figure, because the variance ofyyis 0, then the correlation is undefined. In the bottom row, the relation between variables is not linear, the correlation does not capture that.