There is a general notice before the start of this comment: In numerical simulation, monte carlo method is useful for larger dimension, whereas finite difference method is more useful tan Monte Carlo Method in smaller dimension.

Lecture 1

The generation of the random number includes three steps: Generating independent uniform on [0,1]; generating independent standard normal; generating correlated normal.
The useful RNG is Mersenne Twister, which is 2¹⁹⁹³⁷-1.
Four popular method for random normal generators:
Box-Muller: advantage: easy to understand; Disadvantage: log, cos and sin are quite expensive to calculate.
Marsaglia Polar Method: Advantage: Not that expensive compared with Box Muller method. Disadvantage: Need to abandon some of the random numbers, so it is not that useful when parallel is used.
Marsaglia zigguart method: Advantage: fastest. Disadvantage: hardest to understand.
Inverse Normal Method: Advantage: As accurate as other method,, but still costly.
The normal cdf is related to the error function erf(x): Φ (x) = 1/2+1/2 erf (x/ sqrt(2)).
Two ways calculate the correlated random normal numbers:
Cholesky factorization and PCA decomposition to find out the method for the doing

Lecture 2

Integrating a function on the range of [0,1] is just like calculating the expectation of the function under uniform distribution. Hence the integration can be estimated through calculating the average of the function evaluated at random [0,1] uniform numbers.
The estimator above is unbiased and consistent.
The error is to subtract estimated value from the real value. Bias is the expectation of the error. Root Mean Square Error is square root of the expectation of error square.
The empirical variance can be calculated via general procedure: sum of square minus square of sum. To get the unbiased estimator, one can times the empirical variance with the fraction (N-1)/N.
To calculate the number of samples needed for the required accuracy: N = (σ s(c))/ε)^2.
The root means square error can be regarded as the variance of the error.
When d>4 Monte Carlo Method is much more useful than finite difference method.
Similar way of simulating expectation with independent and correlated density random normal. Simply invert the normal distribution and take expectation on the invnorm and original function with regard to uniform distribution. It’s similar for the correlated random normal as only need times the independent random normal with the decomposed variance covariance matrix based on Cholesky or PCA Method.
The decrease of the accuracy always implies the increase of the computing time. Hence there is a trade-off between the accuracy and efficiency.

Lecture 3

Variance reduction is very important in Monte Carlo Method as one may apply simple method to reduce the variance in certain circumstance.
There are six ways in variance reduction.
Antithetic Method: Notice that this method can only be applied for the situation when the distribution function is and even function. Advantage : The variance is always reduced. Disadvantage: Disadvantage: The computational cost doubles. Hence net benefit only if the co-variance of f(w) and f(-w) is smaller than 0.
Best case: linear payoff. Worst case: symmetric payoff.
Control variate. If there is another payoff f for which we know the expectation, can use g-E(g) to reduce the error in f-E(f).
The good situation is f and g are near linear correlated. Worst situation is that f and g are independent.
Importance Sampling: The basic idea is change of probability measure.
For the last sentence in the page 20 of this lecture, the choosing of μ₂ which gives the distribution of the new sampling.
For the normal distribution, change of mu can be useful when one part of the tail is important,while change of σ is useful when both tails are important.

Lecture 4

Stratified Sampling: The key idea is to achieve a more regular sampling of the most important dimension in the uncertainty.
Procedure: divide the [0,1] interval into M strata ——> take L samples from each strata. ML = N i.e.total sample sizes.
The procedure for this simulation: Break [0,1] in to M strata ——> Take L samples U with uniform probability distribution ——> Define independent random vector from invnorm and uniform samples——> compute average for each strata and overall average.
There is a trade-off between efficiency and confidence.
Notice that it is better to sample more from the strata with her variability.
The multivariate application is similar.
For higher dimension, the number of cube to choose sample from can be quite large. This may forces the sample chosen from each cube can be quite small. Hence the new method called Latin Hypercube is introduced.
Latin Hypercube: Generate M points dimension by dimension ,using sampling with 1 value per stratum, assigning them randomly to the M points to give precisely one point in each stratum. ——> Take L such independently generated set of points, each giving the same average.
In the special case that the function can be written in the sum of one dimension function, then there will be a very large sample size reduction by using large sample size M.

Lecture 5

Quasi Monte Carlo. Standard quasi monte carlo uses the same equal weight estimator but chooses the points systematically so that the estimate is biased, error roughly proportional to 1/N and there is no confidence interval.
To construct the set of points we want to use for the quasi Monte Carlo method, there is one thing called Rank-1 Lattice Rule. (see notes page 9).z is the generating vector with integer components co-prime with N.
Sobol sequence: The idea of the Sobol sequence is to subdivide each dimension with into halves ,quarters etc, and in each cube the number of the sample points are the same.
randomized QMC: Using randomized QMC.
QMC points have the porperty that the points are more uniformly distributed through the lowest dimension. Sonsequently, itis important to think about how the dimensions are allocated to the problem. Previously, have generated correlated Normals through the decomposition of the variance co-variance matrix.

Lecture 6

Finite precision arithmetic: a floating point can be represented f = x × 2ⁿwhere n is the integer expoennt which is given by some number of bits. 1/2<|x|<1 also represented by some number of digits.
relative error is about 10^-16 for long and 10^-7 for short.
For the sum, the standard error for the sum is given by the 2^-S sqrt(N). where N is the size of the sum.
The error can be fatal when we want to simulate the differetiation.
Complex trick: involving complex number may also gives the same result but less error. We only need to take the imaginary part of the function evaluate at point (x+ i dx)). Hence one can take dx small enough. The only issue is that the function should be analytic.

Lecture 7

Greeks is a set of functions that measure the change of value of one derivative corresponding to change of one parameter.
The error might be quite large if take random uniform vector for each X(θ + Δ θ) and X(θ – Δ θ).
In order to solve this, we use same random input for both X(θ + Δ θ) and X(θ – Δ θ).
Finite Difference Sensitivity : There might be some issues when the payoff function is not continuous, hence we need to be very careful about the payoff jump at the non continuity point of the function.
The probability for the payoff jumps at the interval [θ – Δ θ,θ + Δ θ] is O(Δ θ). With this, the variacne might get really large when the Δ θ is small.
Hence what we want to minimise is the mean square error. And the best choice for Δ θ is N^-1/5.
For discontinuous second derivative, it will also makes the variance quite large.

Lecture 8

Likelihood ratio method and path-wise sensitivity: In previous method we consider derivative of the density function with respect to the θ while the path-wise sensitivity take the derivative of the function that we want to take the expectation with.
In the likelihood ratio method, we do not change the measure. We change the function we want to to take expectation with.
For this method , the variance is very large when σ is small, and it is also large for Δ when T is small. (We are talking about estimating the price ).
For path-wise sensitivity, we consider differentiate the function instead. But in this case, we need to assume that the function is differentiable. Similar thing can be applied to second order differentiation.
For the discontinuous situation, we can use smooth function to approach non continuous function, and take the limit as the final result. E.g one example for smoothing is to use the cumulative normal to smooth the digital call function.
The idea of these two methods are still based on simulating, hence we can calculate the expectation via the simulation method with respect to the function inside the bracket after the transformation.

Category: Monte Carlo Method

Comment on Monte Carlo Method