Testing if datasets are different

Prologue:
tt
-score and
zz
-score

The
zz
-score

Let
xx
be the value taken by a random variable
XX
with a probability distribution whose mean is
μ\mu
and whose standard deviation is
σ\sigma
(population values). The z-score, also called standard score, is defined as
z=xμσ ,z = \frac{x - \mu}{\sigma} \ ,
and tells the number of (signed) standard deviations
xx
is away from the mean. It is basically
xx
standardised.
This figure illustrates the concept of the z-score for a normal distribution. You can see that
34.13%2=68.26%34.13 \% \cdot 2 = 68.26 \%
of the items are located at a distance of
1σ1 \sigma
from the mean, then
34.13%2+13.59%2=95.44%34.13 \% \cdot 2 + 13.59 \% \cdot 2 = 95.44 \%
are located at
2σ2 \sigma
, and so on.
Image from Wikipedia, public domain

The
tt
-score

When the population mean and standard deviation are not known, a
tt
-score can be calculated, as
t=xxˉs/n ,t = \frac{x - \bar x}{s/\sqrt{n}} \ ,
xˉ\bar x
being the sample mean,
ss
the sample standard deviation and
nn
the number of samples.

The
tt
-test

The t-test tests, in the two samples variant, if two datasets are significantly different, that is, if their means are really different, the null hypothesis being that they are not. In the one sample variant, the test tests whether the mean is significantly different from the one coming from the null hypothesis.
It was originally published by W S Gosset, known as Student, in Biometrika in 1908. (Proto)typical applications are in medicine, to test whether a treatment is effective or not in curing an illness, or in testing whether girls outperform boys in a school exam, and so on. It is a widely used statistical test.

The gist of it

The test statistic used in this test is distributed according to Student's t distribution under the null hypothesis, which means it would follow a normal distribution if the sample size were bigger. When the data cannot be assumed to be normally distributed, the t-test can't be used, but the Mann-Whitney U test covers for this case.
The t-test evaluates the difference between the means of the distributions with respect to their spread (variability). In the figure, the distributions have the same means difference but very different variabilities.
In the following, the null hypothesis will be indicated with
H0H_0
.

How it works

Two samples t-test

Given two sets of data indicated by indices 1 and 2, whose means are respectively
m1m_1
and
m2m_2
and whose standard deviations
s1s_1
and
s2s_2
(we use
mm
and
ss
to stress these are sample and not population values) the
tt
statistics is calculated as
t=m1m2sm1m2 ,t = \frac{m_1 - m_2}{s_{m_1 - m_2}} \ ,
The denominator is the standard error of the difference of such means:
sm1m2=s12n1+s22n2 .s_{m_1 - m_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \ .
where
sis_i
is the unbiased estimator of the sample variance and
nin_i
is the number of points in the sample.
The
tt
statistics, so calculated, has to be checked against the table of values of the distribution of the Student's t to get the
pp
-value so that if said
pp
-value falls below the chosen threshold for significance,
H0H_0
gets rejected.

One sample t-test

In the one-sample t-test, we test the null hypothesis that the mean
mm
is equal to a specified value
μ0\mu_0
. In this case the t statistics to use is
t=xˉμ0sn ,t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \ ,
with
ss
being the standard deviation of the sample and
nn
the sample size.
Paired t-test
In the paired t-test, we compare two population means where we have two samples and the observations in them are paired.
For example, we have observations before and after performing some action on the same individual (example: students' results before and after a course or two medical treatment results on the same individual). The observations are then not independent so a 2-sample t-test is not appropriate.
A paired t-test is performed by testing the difference of the two measurements in a 1-sample t-test, so the difference of pairs does not follow a symmetric distribution around 0.
Steps are
  1. 1.
    i\forall i
    , we calculate
    x1,ix2,i|x{1, i} - x{2, i}|
    and
    sng(x1,ix2,i)sng(x{1, i} - x{2, i})
  2. 2.
    we exclude pairs with such difference being 0, so we have the reduced sample size
    NrN_r
  3. 3.
    Order the
    NrN_r
    pairs by the absolute differences ascending
  4. 4.
    Rank the pairs so that the smallest gets rank 1, ties are ranked with rank equal to the average of the ranks spanned
  5. 5.
    Calculate the test statistics
    w=i=1Nrsgn(x1,ix2,i)Riw = \sum{i=1}^{N_r} sgn(x{1, i} - x_{2, i}) R_i
    where
    RiR_i
    is the rank of the pair
  6. 6.
    Under the null hypothesis
    H0H_0
    ,
    ww
    follows a specific distribution (there is no simple expression) with expected value 0 and variance
    Nr(Nr+1)(2Nr+1)6\frac{N_r (N_r + 1)(2 N_r + 1)}{6}
    , so
    ww
    can be compared to table values and
    H0H_0
    gets rejected if
    wWcritical,Nr|w| \geq W{critical, N_r}
  7. 7.
    As
    NrN_r
    increases, the distribution of
    ww
    converges to a gaussian, thus a z-score can be calculated as
    z=wσwz = \frac{w}{\sigma_w}
    , where
    σw\sigma_w
    is the standard deviation, so if
    zzcritical|z| \geq z_{critical}
    we reject
    H0H_0

The
zz
-test

The z-test uses the z-score in much the same way as the t-test uses the t-score. In fact, it is the analog of the t-test for the situation when the parameters of the underlying population are known, rather than estimated from a sample.
It is a test in which the statistic follows a normal distribution:
  • in the one-sample z-test one tests the null hypothesis that the population mean is equal to
    μ0\mu_0
  • in the two-samples z-test two means are compared

References

  1. 1.
    The Wikipedia page on the standard score
  2. 2.
    Student, The probable error of a mean, Biometrika, 6:1, 1908
Last modified 7mo ago