Machine Learning: fundamental algorithms
The Monte Carlo method

What is

The Monte Carlo is a probability-based method, very popular in Physics circles, to perform numerical estimations of quantities. It relies on the very simple idea of repeated random sampling and is often used to estimate integrals. In practice, what you do is drawing a lot of random numbers, and observing the number of them which respect a certain property, property that will give you the estimate of the quantity you're looking for and that is difficult to calculate analytically.
The method is a very simple and elegant one, and this is why it's one of the silver bullets of Physics. It was proposed by S Ulam while working on military-related projects at the Los Alamos labs in 1940 and became a big contributor to the work of the Project Manhattan, see this historical review paper for a dive into those times. The name is a clear reference to the casino in Monaco.
Monte Carlo estimations are regularly used in problems of
  • optimisation
  • numerical integration
  • drawing from a probability distribution
The basic idea can be visualised by imagining that we can estimate the surface area of a lake by throwing stones across and counting the number of those which fall in the lake with respect to those that fall outside.
The Monte Carlo method is fundamentally based on the law of large numbers (see page).

Numerical integration: an example estimating
π\pi

A notebook with the code presented here can be seen here.
This is a pedagogical example, cited in many places, for instance Wikipedia. Because of the relation linking
π\pi
to the area
AA
of a circle of radius
rr
, namely
A=πr2 ,A = \pi r^2 \ ,
we can easily estimate
π\pi
via considering a quarter-circle inscribed in a square. The area of the square would simply be
r2r^2
, hence the ratio of the two areas comes down to
π\pi
if the radius is 1, which is how we would estimate
π\pi
.
For other examples, see the reference below.
1
def f_quartercircle(x):
2
"""Quarter of a cirle in the first quadrant."""
3
return np.sqrt(1-x**2)
4
5
6
# get 1000 numbers between 0 and 1, equally spaced, and the quarter circle function on them
7
x = np.linspace(0, 1, num=1000)
8
y = [f_quartercircle(item) for item in x]
9
10
# loop over the number of extracted points, and extract them uniformly between 0 and 1, both x and y
11
for n in [100, 1000, 5000, 10000, 20000, 30000, 50000]:
12
13
points = np.random.uniform(0, 1, size=(n, 2)) # n points in the plane, randomly (uniformly) extracted in [0,1]
14
15
under_points = []
16
over_points = []
17
18
# select if point is below or above the circle
19
for point in points:
20
if point[1] <= f_quartercircle(point[0]):
21
under_points.append(point)
22
else:
23
over_points.append(point)
24
25
# estimate pi as the ratio of number of points below circle to total
26
est_pi = float(len(under_points)) / n * 4
27
28
# compute the relative error to the real pi, in percentage
29
perc_err = abs(float(est_pi - np.pi))/np.pi
30
print('Estimated pi at %d points: %f, with relative error %f' %(n, est_pi, perc_err))
Copied!
This will print the estimation of
π\pi
with decreasing relative error.
1
plt.figure(figsize=(10, 10))
2
3
plt.title('Final estimation plot')
4
plt.plot(x, y, color='r', lw=3)
5
plt.plot([point[0] for point in under_points], [point[1] for point in under_points], 'x', alpha=0.5)
6
plt.plot([point[0] for point in over_points], [point[1] for point in over_points], 'x', alpha=0.5)
7
plt.show();
Copied!
Monte-Carloing pi

Distribution sampling

Using Monte Carlo for distribution sampling means doing this:
  1. 1.
    generate some independent datasets under the condition of interest
  2. 2.
    computer the numerical value of the estimator for each dataset, that is, the test statistic
  3. 3.
    compute summary statistics over all the dataset test statistic computed
If for instance we want to estimate the mean and standard deviation of a random variable
YY
, and we have a sample of
NN
values of it, the sample mean and the sample standard deviation are effectively Monte Carlo estimations of those which by the law of large numbers we expect to converge to the population ones when
NN
is big enough.

References

  1. 1.
    Stan Ulam, John Von Neumann and the Monte Carlo method, Los Alamos Science special issue, 1987
  2. 2.
    Wikipedia on the Monte Carlo method
  3. 3.
    Some useful slides on the method, with examples, from the University of Geneva
Last modified 7mo ago