Comment on Notes of Statistics and Financial Data Analysis

Lecture 1

Hypothesis Test:

  1. All the test are based on the idea that the null hypothesis is true.
  2. There are two ways of testing: check if the condition under null hypothesis is within the confidence interval or if the probability of the data occurs under the hull hypothesis is true.
  3. The t distribution with degree freedom n is one standard normal  rv divided by the the square root of the chi-square with degree freedom n.

Multiple Comparison

    1. Consider one linear model given in the notes Y= Xβ+ σε, then the comparison between two models is based on the ANOVA test, i.e. analyze the differences among group means in a sample.
    2. To compare the nested model, we use the F-test with statistic f = (RSS0 – RSS1)*m/(RSS1)*k. Where k is the number of the predictor excluded( &beta = 0); m  = n-k. Then f follows the distribution Fk,m.
  1. There is another method called the approximation f test which is used for testing none nested models (it can be somehow useful when the number of the sample are significant).
  2. To see the some ideas in on approximate F test.

Lecture 2

Polynomial Regression

  1. Polynomial regression is the regression method for the non linear effect.
  2. When the order of regression goes higher, the data may not look like a simple polynomial,  and this may cause huge edge effects (i.e., the polynomial may act significantly peculiar when the predictor tends to extreme value).

Piecewise Linear Approximation

  1. To overcome the side effect of the polynomial regression, we tend to use the piecewise linear approximation instead of the simple polynomial regression.
  2. The idea of doing it involves two things: first to choose the number of nodes we use in the  model, and then fit each section with linear model. Notice that there is restriction that the fitted model should be continuous in at all nodes. To see the detail of this, check continuous piecewise linear approximation (paper from MIT and a bit advanced).
  3. Can also check page 141 of the book Elementary Statitics Learning. for general peicewise polynomials.
  4. Notice that there are packages in R that can give the model which follows piecewise continuous linear regression.
  5. The rule of thumb for defining the node is to use the quantile of the predictor. Detailed in lecture notes lecture 2 page 1.

Spline Bases(Basis)

  1. The idea of  spline is to introduce the cubic polynomial other than linear regression. The reason why we choose the given basis is that it’s more convenient to use this basis. Notice that there is a command in R to calculate the splines of the given data.

Natrual Spline

  1. Things may still be peculiar when the predictor is extreme,  there is still one command in R that gives the natural splines for the given data.
  2. Advantage: Gain more degree of freedom, giving a better overall fit.
  3. For more detail: see HTF in the book ESL and an introduction to splines.

Approximation F test

  1. Approximation F test can be applied for two models that are not generally nested. There are several things needs to be careful: the amount of the data should be moderately large to get the a relative well performance in practice.
  2. In R it can be achieved by anova function.

Lecture 3

Information Criteria & Model Selection

  1. The AIC is introduced to select the model as it awards the increase of the probability and simultaneously penalize the overfitting. For small sample size, AICc is used rather than AIC.
  2. Another Criterion is Bayesian Information Criterion (BIC). It penalize more for the complex  model.
  3. Remark: For finite sample size, AIC gives better performance while that BIC gives a better performance when the sample size tends to infinity.
  4. There are also functions for AIC and BIC in R for the calculation.

Heteroscedasticity and Weighted Regression

  1. Heteroscedasticity means the variance of the predictor follows the function of the predictor. i.e. variance is not fixed.
  2. With the normal assumption of the error, we can find out that the weight can be calculated as the inverse of the function at the predictor times 1/2. Consequently, the residual sum of square can be calculated with the corresponding weight.
  3. Hence the weight of each prediction is related to the variance with the given predictor. Therefore the weighted matrix can be calculated by applying corresponding weight into the diagonal of the given matrix.
  4. The cook distance is used to identify the outlier of the given data. When D is greater than 0.5, it might influence the prediction and when D is greater than 1, it has highly possibility to be the outlier.

Lecture 4

Normality Testing

  1. QQ-plot: This is to check the normality through ploting sample quantile against the theoretical quantile. if it follows the normal distribution, the QQ-plot is usually a straight line. See QQ-plot
  2. For the formal tests: see Jarque-Bera Test and shapiro test .
  3. J-B test involves the kurtosis of the normal distribution.
  4. The procedure of the J-B test is: Standardize the data ——> calculate J——> reject large J.
  5. The procedure for the S-W test: Standardize the data ——> calculateB——> reject large B.

Risk Measurement

Value at Risk

  1. Notice that the L means the loss distribution as the larger the value is, the more the loss is. Hence there are two ways of defining the VaR, i.e. the value at risk. One depends on the loss and the other one relies on the profit = -loss. Hence the specification should be given that whether the given distribution is a loss distribution or profit distribution.
  2. There are four main issues in the definition of the value at risk: Model Risk, Liquidity Risk, Parameter Chosen Risk, Non- subadditive.

Expected Shortfall

  1. The introduction of ES is to solve the problem of non sub-additive.
  2. The meaning of the definition of Expected Shorfall is to calculate the average value at risk when the probability of loss occurs is smaller than the given parameter α.
  3. ES is sub-additive.
  4. The second definition of ES should take coefficient (1-α)-1 instead of the original one.

Lecture 5

Financial Return

  1. The return is defined as the price of today minus the price of yesterday divided by the price of yesterday.
  2. The annualized return simply use times return with 252(trading days every year)

Stationarity

  1. There are two types of stationarity: weak and strong.
    1. Weak stationarity: Just requires the first and second moment are the same and the autocovariance is just a function of τ .
    2. Strong  (strictly) stationarity involves the joint distribution of all time are the same regardless of the lag τ.
  2. Notice that the T in the page 2 can be regarded as the number of the data we have if we consider the time interval are the same for each data we obtain.

PACF

  1. To calculate pacf at time k , we regress the autocorrelation at time k-1 with the previous time and then calculate the predicted auto correlation at time k other than calculating the autocorrelation directly from the data.
  2. The matrix W can be obtained from the formula in the line under the theorem 1. The diagonal element of the matrix can also be obtained via the formula of whh. and as the result of multivariate normality and the variance, we can test the normality of if it follows the normal distribution.
  3. Test for the autocorrelation is  :
    1. Ljung–Box portmanteau test. (The distribution used is the chi-square distribution.)

Lecture 6

  1. Estimation of AR1 method:
    1. Method of moment.
    2. MLE method.
  2. Forecasting the AR1: it is given by the ways of writing xm+t recursively to  xt

Lecture 7

  1. AR(p) and MA(q): it generalises AR model and MA model. If involved with backshift operator, we can write the model as Φ(B)Xt  = Θ(B)Xt   .
  2. For the statinarity, the root for the Φ(B) should all be outside the unit circle.
  3. The autocorrelation of the MA(q) model can be calculated directly.
  4.  X is causal if and only if φ has no zeros inside the closed complex unit circle.
  5. X is invertible if and only if θ has no zeros inside the closed complex unit circle.
  6. Response to shock:  For an AR process, we generally have
    that ψi collapses exponentially quickly. For a pure MA process, ψi
    is zero for large enough.

Lecture 8

  1. Fitting AR(p):
    1. regressive method.: ) A basic approach to calibrating an ARMA model is to first fit a long autoregressive model to the data. This allows estimation of the innovations via residuals.
    2. Yule–Walker equations.:
    3. MLE method: it may be quite slow with large dataset. see page 3 of the lecture 8 notes.
  2. Diagnostic: Key are to ensure that there is no further relationship between the residuals from the fitted model and the predictors, and that a normal approximation is appropriate.

Leave a comment