What is the importance of normality and heteroscedasticity?

After previously discussing linearity and multicollinearity in Classical Assumption Test Part 1, this time I will discuss the normality test, heteroscedasticity test, and autocorrelation test, which are part of the classical assumption test.

Before discussing the three tests further, I would like to explain again the importance of the classical assumption test. The classical assumption test is a test carried out before processing regression data (either simple or multiple) so that the resulting equation meets the Best Linear Unbiased Estimator rules. If the classical assumption test is not carried out before data processing, the resulting equation is doubtful in its ability to produce accurate predictions.

NORMALITY TEST

In simple terms, this normality test aims to make the residuals resulting from the difference between actual Y and predicted Y normally distributed. Why should it be normal? Normal distribution criteria are usually used for parametric analysis using ratio or interval data. If the error or residual data is normally distributed, it is very easy to determine the confidence level with α = 0.05 or 0.1. The confidence level obtained must be accurate, considering that parametric data usually uses very sensitive data. For example, data on bacterial growth against a disease or other examples

Let’s immediately practice the test.

The first step is to determine the residual value. The residual value is obtained from the difference between the actual Y value and the predicted Y value. Here is the data I have:

normality test

Select Analyze-Regression-linear

Enter the Y value into the dependent variable and the X1 and X2 variables into the independent variable(s).

Click the save button, then select unstandardized on the residual tab, click continue.

Click Ok and wait for the result, the worksheet shows there is a new variable RES_1

normality and heteroscedasticity

Then we will use the data to determine whether the residual values are normally distributed or not using the Kolgomorov-Smirnov test.

Select the Analyze menu-non parametric test-one sample

heteroscedasticity

Select the variables to be submitted to the field tab. It can be the Res_1 variable only, or the whole variable. Here I enter the whole because the analysis will be one/separate. Then click run.

hypothesis test summary

It can be seen in the fourth row that the residuals are normally distributed with a significant level of 0.05 or H0.

 HETEROSCEDASTICITY TEST

This test is used to see if there is an inequality of variance from the residuals of one observation to another. If there is inequality, it is called a symptom of heteroscedasticity.

You see, regression requires regularity in the variance of the residuals. Or you could even say that the residual variance should be the same or constant. If the variance is different, it is called heteroscedasticity. Because the variance varies, it is feared that the predicted Y value will be inconsistent or biased.

Jump into practice…

We still use the Res_1 variable used in the normality test above, but we use its absolute value (no negative value). How?

Choose the transform-compute variable menu.

compute variabel

Then using the absolute function (abs) we create a new variable called abs_1

transform value variabel

The result:

result transform data heteroscedasticity

Then we regress the variable abs_1 as the dependent variable (Y) and the variables X1 and X2 as the independent variables.

regression data normality

Results obtained

It can be seen from the F value and R Squared that the independent effect does not significantly explain the variation that occurs in the dependent variable (in this case, the absolute value of the residue), so it is said that it does not have symptoms of heteroscedasticity.

The test we use is called the Glejser test.

AUTOCORRELATION TEST

Autocorrelation is a correlation (relationship) that occurs between a series of observations arranged in time, which usually occurs in time series data. If it turns out that the variable t or t-1 (read: time period) actually affects the value of Y, then this is what is called autocorrelation. This is usually characterized by the value or graph of the residuals having a pattern, such as linear or zigzagging, according to a certain cycle. What is the meaning. This means that it is likely that the appropriate analysis to describe the equation is time series analysis. If you still want to use regression as an analytical tool, you should include time as an independent variable. The meaning of time included as an independent variable does not mean that the time “year” is included, but there is an element of Y-1 as an independent variable, or statistical terms lag-1, lag-2, etc. That is the simplest technique. That is the simplest technique.

Jump to the application. We will use the Durbin-Watson value.

Choose analyze-regression-linear

Then click the statistic button after entering the Y and X variables.

autocorrelation

Check durbin-watson and click continue.

durbin watson

Click OK and wait for the process

summary autocorrelation

We get the Durbin-Watson value of 1.248.

We can interpret the Durbin-Watson value as follows:

If the test obtained a statistical DW value below -2, it indicates that there is positive autocorrelation.
If the test obtained a statistical DW value between -2 and 2, then it is indicated that there is no autocorrelation.
If the test obtained a statistical DW value above 2, then there is a negative autocorrelation.

If the Durbin-Watson result of 1.248 is obtained, it can be seen that there is no autocorrelation in the data I used.

Thank you.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *