Ironing of Regression. The need for data fitting techniques in regression is because regression is synonymous with linear. This is because the standard pattern of regression is a straight line. Where Y = a +bx +e … with a as a constant, b is a coefficient that reflects the relationship between variables x and Y, and e is an error or outside variable that is not explained in the regression model.
The problem occurs when we regress the variables x and y, and it turns out that the relationship between the two is a quadratic relationship. If we force regression, we can be sure that we will get a small R squared. One way to overcome this problem is to use quadratic regression.
What is quadratic regression?
Actually, the regression rule is still linear. But we slightly modify the variable X so that it becomes a quadratic equation. If, initially, the equation is:
Y = a + bx + e, then we change it to:
Y = a + bx2 + e.
By squaring the x variable, it becomes a new variable in the regression equation. Then we process it just like we process multiple regressions.
The above technique is one of the data-ironing techniques for regression so that we can find the right equation for the data we will process. Sometimes we get stressed thinking about why we never find enough R squared to be able to translate the regression equation to the research problem. In fact, the data has been collected, and it is not possible to retrieve data for other variables. We often hear “The data is ironed first, then regressed”.
This technique is usually not taught because of time constraints. So it really needs creativity from the author to be able to solve this non-linear problem. However, if we are observant, there are actually many who teach this technique as non-linear regression.
Various non-linear regression techniques include quadratic regression, hyperbolic regression, exponential regression, and geometric/log linear regression.
Ironing of Regression: Quadratic Regression
I have the following data:
Then I regressed using minitab by copying the data into the minitab sheet, then selecting stat – regression – regression – fit regression model.
Enter Y as responses, and X as continuous predictors, then click OK.
The results look as follows:
It can be seen that the equation results in an R-sq (adj) value of 90.71%. meaning that the x variable is able to explain the value of Y by 90.71%, while the rest is explained by errors or other variables outside of X. This equation is actually good for explaining the hub of Y with X. However, I will try a quadratic equation, and we will then compare the results.
The first step is to create a new variable, X1 = X2. I use Excel to transform the data to make it easier. Then I repeat the steps above. But this time I input Y as responses and X1 as continuous predictors.
And the results are as follows:
It can be seen that the regression of Y versus X1 is better than Y versus X because it has a higher R-sq(adj) of 98.32% than before. This means that the quadratic equation is able to describe the value of Y by 98.32%; the rest is explained by variables outside the equation or referred to as error. The amount of error possessed by the quadratic equation is smaller than that of the linear regression equation.
Then, how is the equation?
If we look at the minitab output, we can conclude that the equation is
Y = 2.54 + 0.8029 X1, because X1 = X2, then we can write the equation as:
Y = 2.54 + 0.08029 X2
Extra session: how to read minitab regression output?
- We look at the analysis of Variance, which in SPSS is called annova. In this case, we immediately see the P-value. The P-Value indicates the significance value. If we use a confidence level of α = 0.05, then Kitabisa says we can use the equation if the P-value is below α = 0.05. Likewise, if we use α = 0.1, This anova assesses the equation as a whole, commonly called the goodness of fit.
- The second goodness of fit is measured by looking at the R-squared. It is located under Analysis of Variance with the title Model Summary.” This R-squared illustrates that the resulting equation is able to explain the Y data by the R-sq value; the rest (i.e., 100 minus the R-sq value) is error. A good equation has at least 75% R-sq.
- If the equation has met the goodness of fit rules, we see the significant variables and constants produced in the Coefficients section. Each coefficient and variable produces a significant t-count at the P-value. Just like the F test or ANOVA, we look directly at the P-value to determine whether the variable significantly affects Y. Coefficients and variables are said to significantly affect Y if they have a p-value < α. Alpha, or confidence level, can be set at 0.05 or 0.1, according to the methodology you use.
- The final part, the regression equation, is the resulting equation.
Hyperbolic Regression (inverse)
Remember the hyperbolic equation? Hyperbolic is usually 1/x.
This means that we convert all x values to 1/x.
Let’s try to practice it right away… the data I use is as follows:
I processed the data this time using IBM SPSS. Copy the data in the SPSS worksheet. SPSS has a data view and a variable view. We first change the variable view with the variable names Y and X. then we select analyze, regression, and linear.
Then we input Y as dependent, and X as independent. Make sure the method is enter. Click OK
Wait for spss to process and the results will come out as follows:
The summary model explains that the adjusted R-square only has a value of 0.476. meaning that the X value mix can only explain the Y value by 47.6%; the remaining 0.524 is explained by errors or variables outside X. Because the error value is so large, it can be said that the resulting equation is poor. However, if we look at the annova or F test, it turns out that this equation is significant below α = 0.05. and the T test or coefficient with the constant is significant at α = 0.1, and the X variable is significant below α = 0.05.
We can use the inverse in this exercise. By converting x to 1/x. how? If you are using SPSS, SPSS provides a feature to convert variables.
Choose transform-compute variable
We type the variable name and formula. Click OK
The result shows the variable X1 as the new variable. X1 = 1/X
Then we regress X1 on Y in the same way as above; the difference is that we enter Y as dependent and X1 as independent.
The results are as follows:
The important thing that I will explain here is that the R-squared value has changed to 0.952. This means that the modification of the x variable is very successful in determining the Y equation. Even the F and T tests are better than the previous equation.
The equation is:
Y = 2.580 – 6.813 X1, because X1 = 1/x, it can be written as
Y = 2,580 – (6,813/X)
Do you have any questions before moving on to the next technique? Please write in the comments.
Unlike the techniques above, exponential regression modifies the Y variable into Ln Y. Let’s practice right away.
Examples of data that I have:
I regressed the data and came up with the following: It is up to you whether to use Minitab or SPSS.
Then I modified the data, namely I changed Y to Ln Y. I used excell.
Then I regressed back as follows:
It can be seen that the R-sq is 97%. The explanation is the same as for the previous technique.
Geometric regression modifies the variables Y and X. Y is modified to Ln Y, and X becomes Ln X.
I regressed it and the results are as follows:
The result shows that the R-sq is 70.48%
Then I modified the variables to X1 and Y1
Then I regressed again and the result was…..
It can now be seen that the modified equation has an R-Sq of 95.12%. with the equation:
Y1 = 1,104 + 3,492 X1, because Y1 = Ln Y and X1 = Ln X, then it can be written:
Ln Y = 1.104 + 3.492 Ln X.
This LN technique is what most people call the iron technique. Because of Ln-ing, data that is thousands or even hundreds can become just decimals. Similar to an ironed cloth. Heheheh…
Okay, that’s the material about various regression data processing techniques. The point is, don’t complain if you get stuck processing research data. Data can be modified with full responsibility. Remember, modifying is very different from manipulating. A researcher is strictly prohibited from manipulating data.
Then the question is “How do we know if we are using quadratic, inverse, geometric, or exponential”?
First, you must be familiar with the data you are using. Try to convert the data into a graph (especially Y values), whether the distribution of Y values is linear, quadratic, loglinear, inverse, and so on. This will help you know whether your data is linear or not and which technique is appropriate for the next process.
This image is the example I used for the inverse regression exercise. It can be seen that the Y-value plot forms a hyperbolic or inverse graph.
Second, use them all. Then compare the output. If you want to be sure, you can use all the techniques. After all, to process the data, we are assisted by software that is easy and does not need to be calculated manually, right?
Thank you for visiting.