On sampling and testing
You have some data (a sample, which you have built with sweat and blood), you have an hypothesis and the data is meant for you to prove or disprove it.

How to sample

In general, you'd have a sample of a population, otherwise (if you had the full population) you would know everything about it. You have to test precisely because you only have partial information. But is your sample always good/representative? Sampling randomly (that is, uniformly) isn't always the best idea.

Stratified sampling

Stratified sampling is a way to sample data from a population, especially in cases when said population isn't homogeneous so sampling "randomly" (all points extracted with the same probability) risks not reflecting the lack of homogeneity.
Stratification is the process of dividing the population into homogeneous subgroups before sampling (strata), so that each element only belongs to one stratum, and then random sampling is applied on each stratum.

Proportional allocation

In this strategy, you use the sampling fraction for each stratum: if
nn
is the desired sample size, we use
ns=nNsNn_s = \frac{n N_s}{N}
, where
NN
is the total number of items and
NsN_s
the number of items in the stratum as the size fraction of the stratum.

Optimal allocation

In this strategy, the standard deviation of the distribution in each stratum gets taken into account, so that the size fraction of the stratum is
ns=nNsσsk=1SNkσkn_s = \frac{n N_s \sigma_s}{\sum{k=1}^S N_k \sigma_k}
. What this means is that strata are weighted with their variability.

Testing

The null hypothesis

The null hypothesis is the one checked against in the statistical test, that is, the one we are checking if we can disregard; it is basically assumed to be true until some evidence proves the contrary. Typically, it is indicated as
H0\mathcal{H_0}
.

Hypothesis testing and types of error

Type I error and Type II error

It is a false positive, that is, occurs when
H0\mathcal{H_0}
is erroneously rejected.
It is a false negative, that is, occurs when
H0\mathcal{H_0}
is not rejected when it should be.
Here's a handy table:
Null hypothesis and types of errors
H0\mathcal{H_0}
true
H0\mathcal{H_0}
false
Reject
H0\mathcal{H_0}
Type I
true positive
Don't reject
H0\mathcal{H_0}
true negative
Type II

References

Last modified 7mo ago