Meta & resources

Probability, statistics and data analysis

Machine Learning: concepts & procedures

Machine Learning: fundamental algorithms

Machine Learning: model assessment

Artificial neural networks

Natural language processing

The computer science appendix

The mathematics appendix

Mathematical functions

This page will just list some common functions used in Machine Learning and Data Science in general. For the code, if you want to reproduce the plots, you just need to import Pyplot:

1

from matplotlib import pyplot as plt

Copied!

Big O and little O notation

The *big O* notation is used in mathematics to signify the limiting behaviour of a function when it goes to

$\infty$

:$\lim_{x \to \infty} f$

The letter "O" is used as per *order of function*.

Note that in computer science, the big O notation is used to classify algorithms by how they respond to changes in the input size.

The *little O* notation instead,

$f(x) = o(g(x)) \ ,$

means that

$g(x)$

grows much faster than$f(x)$

.Convolution

The mathematical convolution of functions is the operation

$(f \star g)(x) = \int_{-\infty}^{+\infty} dy f(y) g(x - y)$

It is a symmetric operation. In fact,

$(g \star f)(x) = \int_{-\infty}^{+\infty} dy g(y)f(x-y) \ ,$

using

$z = x-y$

, so that$dy = -dz$

, then$(g \star f)(x) = -\int_{+\infty}^{-\infty} dz g(x-z)f(z) =\int_{-\infty}^{+\infty} dz g(x-z)f(z)$

Some functions of common use in Machine Learning/Statistics

Heaviside step

The Heaviside step function is of common use in lots of applications. It is just a simple step:

$f(x) =
\begin{cases}
1 \text{ if } x \geq 0 \\
0 \text{ if } x < 0
\end{cases}$

Softmax

The softmax is a normalised exponential used in probability theory as a generalisation of the logistic function. What it does is transforming a K-dimensional vector

$\mathbf{x}$

of arbitrary real values into a vector of the same size with elements which are still real numbers but ranging in the interval [0,1] and such that their sum equals 1 (so they can represent probabilities). The function has the form$f(x_i) = \frac{e^{x_i}}{\sum_{j \in K} e^{x_j}}$

The softmax is also often employed in the context of neural networks. It is called this way because it represents a softening of the max function in the sense that it is larger on the max of the array. See the example.

1

def softmax(x):

2

return np.exp(x) / np.sum(np.exp(x))

3

4

x = np.arange(-6, 7)

5

y = softmax(x)

6

7

plt.plot(x, y)

8

plt.title('Softmax function')

9

plt.xlabel('$x#x27;)

10

plt.ylabel('$y#x27;)

11

plt.savefig('softmax.png', dpi=200)

12

plt.show();

Copied!

Logit and logistic functions

Given probability p, the odds are defined as*logit* function is the logarithm of the odds:

$o = \frac{p}{1-p}$

. The $L(p) = \ln{\frac{p}{1-p}}$

A negative logit is for p < 0.5.

1

p = np.arange(0.1, 1.1, 0.1)

2

y = np.log(p/(1-p))

3

4

plt.plot(p, y)

5

plt.grid()

6

plt.title('Logit function')

7

plt.ylabel('$y#x27;)

8

plt.xlabel('$p#x27;)

9

plt.show();

Copied!

Now, the probability expressed as a function of the logit creates the *logistic* function:

$L = \ln{\frac{p}{1-p}} \Leftrightarrow -L = \ln{\frac{1}{p} - 1} \Leftrightarrow \frac{1}{p} = 1 + e^{-L} \Leftrightarrow p = \frac{1}{1+e^{-L}}$

1

L = np.arange(-5, 5, 0.2)

2

p = 1/(1 + np.exp(-L))

3

4

plt.plot(L, p)

5

plt.title('Logistic function')

6

plt.xlabel('logit')

7

plt.ylabel('$p#x27;)

8

plt.show();

Copied!