This is All About Logistic Regression

Logistic regression is a type of regression that relates one or more independent variables (independent variables) to a dependent variable that is a category, usually 0 and 1. This type of categorical independent variable is what distinguishes logistic regression from multiple regression or other linear regression.

Category values are usually written as 0 and 1. At the time of writing this article, most researchers used logistic regression to process only two categories. 0 is usually used for the “no” or “not yet” category. While the number 1 is usually used to describe respondents that correspond to the purpose of the research, For example, in the thesis that I wrote in 2008, it tells the story of “Factors influencing the decision of carrot farmers to choose an organic farming system in Tugu Selatan village, Cisarua sub-district, Bogor district.”

or other alternative downloads here.

The thesis explains the dependent variable, which consists of farmers who do not or have not used organic farming systems (with the symbol 0) and the second group of farmers who have used organic farming systems (with the symbol 1).

Because the dependent value is in the form of categories 0 and 1, of course, the explanation of the equation that connects the independent variable and the dependent variable cannot be done linearly, as is done in general regression. So the use of logistic regression is needed to calculate the chances of respondents’ tendencies being 0 to 1.

Purpose of Using Logistic Regression

Understanding the purpose of using logistic regression is necessary before you use it as a research tool. You can explore and discuss in detail if you master the purpose of using logistic regression. After I read some journals that use logistic regression, I concluded that the objectives of using logistic regression in general are three, namely

Calculating the odds

The equation obtained from the logistic regression process can be used to calculate the chances of respondents outside the group included in the study. An example that can be understood is the credit application process. The bank usually evaluates whether a person is eligible or not to receive a loan from it. Several questions are given to the bank by prospective credit recipients. The questions given about the variable characteristics of the prospective capital recipient are independent variables that will be inputted by bank officers into the model. From several variables that are asked, bank officers can determine the chances of the prospective credit recipient being able to return the loan or not, with a value between 0 and 1.

Of course, the model used by bank officers is a logistic regression model based on previous borrower data. In the model, there is a component that usually a borrower who has an income below a certain amount with a loan that has been previously owned by a certain amount plus a certain number of work dependents has a chance to return a loan of a certain amount (value 0 -1).

Viewing characteristics

This second objective is often used to look at differences in characteristics between two groups. One of them is my thesis, which I mentioned above. The thesis describes the characteristics of inorganic farmers and organic farmers. The conclusion was that the chances of farmers being able to switch from inorganic to organic are due to the difference in price of the products resulting from the two processes. Organic farmers are willing to switch from inorganic to organic even though organic productivity is smaller than inorganic. However, the high price difference makes organic farmers have a higher income than inorganic farmers.

The purpose of looking at these characteristics is usually to discuss the odds ratio value of each independent variable (the odds ratio value is the exponent of the coefficient of each variable). The odds ratio value explains the chance of respondents switching to organic (see the case example above). The explanation of the odds ratio value is different from the explanation of the regression coefficient value in general. If the regression coefficient explains: “If variable X increases by 1 unit, then the value of Y will increase by the value of the unit coefficient” then exp(coefficient) or odds ratio in logistic regression explains: “respondents who have a higher x variable will be more likely to choose organic (example case above) by “exp (coefficient value) or commonly called the odds ratio” times compared to respondents who have a lower x variable”. Yes, the exp(coefficient) value in logistic regression, also referred to as the odds ratio, explains the odds and does not explain how “higher” the X variable is.

Influencing Factors

This third objective is a development of the second objective; researchers are able to find out the factors that influence why there is a difference between the two groups. A high odds ratio value indicates that the variable has a high influence on the respondent’s choice of difference. The purpose of knowing this influencing factor is that it is hoped that the significant influencing factor is a factor that can be regulated by researchers or policymakers so that it can lead other respondents to do the same as the respondents who scored 1 before.

An example in this thesis is that price is the most influential factor in farmers’ preferences for choosing organic farming, so the government, if it wants to develop organic farming, must carry out policies that continue to stabilize prices so that they continue to be above the price of inorganic products so that organic farmers will be more interested and can continue to grow.

What is the Logistic Regression Equation Model?

If linear regression has the equation:

Y = a + b1X1 +… + bnXn, where a is a constant and b1 to bn are coefficients, then logistic regression will also produce the same output if you use Minitab or SPSS software. However, you would be mistaken to use that equation to explain or discuss probability.

logistic regression

The coefficient value of each variable from the figure above is located in column B, while the explanation of the odds ratio value of each variable is in column Exp(B). If you use logistic regression to create equations and interpret the odds on other respondents, then your discussion will dwell on column B to create equations. If you discuss the factors that affect the variable partially, then you will discuss the odd ratio or exp(B) column.

The logistic regression equation is:

persamaan regresi logistik

B0 is a constant, B1 is the coefficient of each variable.

The value of p or chance (Y=1) can be found with the equation :

You can use this equation to calculate the probability of a respondent having the variable value specified in the equation; the final p-value will certainly range from 0 to 1.

Stages of the Logistic Regression Process

Let’s practice directly using SPSS 22. Open SPSS and copy the data you have. The logistic regression process starts with clicking analyze – regression – binary logistic.

regresi logistik 02

Then fill in the values of the dependent column with the Y variable and the covariate column with the independent variables. You can use various methods to eliminate variables and get the best equation to interpret your research. You can read my article on how to eliminate variables in regression. In this exercise, we chose the enter method. Click oK.

regresi logistik 03

Output Interpretation

The interpretation of this spss output begins by looking at the goodness of fit of the logistic regression model equation whether it has met the requirements for interpreting the y value or not. Some of the requirements that must be considered in the goodness of fit of logistic regression are:

Omnibus Test and R Squared

regresi logistik 04

The significant value of the omnibus test must be below 0.05 if you use a 95% confidence level. The Omnibus Test with x number of independent variables produces a significance value that is lower than 0.05. This shows that there is a significant effect of x independent variables simultaneously affecting the dependent variable. Then the Nagelkerke R Square value is the R squared value in linear regression. The independent variable is able to explain 86 percent of the dependent variable, as seen from the Nagelkerke Square value of 0.86. While the other 14 percent can be explained by other factors outside the independent variables in the logistic regression equation.

Hosmer and Lemeshow Test

Unlike the omnibus test, the value of the Hosmer and Lemeshow test is actually said to be good if the significant value is> 0.05.

logistic regression

Hosmer’s value is greater than α = 0.05, meaning that accept H0, namely the logistic regression model, is able to explain the data and there is no difference between the model and the observation value. This shows that the logistic regression equation can be used to explain the relationship between the independent variable and the dependent variable.

Significance of Each Variable

These three indicators indicate the goodness of fit of the model and whether it is good enough to interpret the relationship between the independent variable and the dependent variable. If there are problems with these three indicators, you can select variables using backward or forward techniques, as I explained in the article How to Eliminate Variables in Regression.

Next, you must assess which individual variables affect the dependent variable by looking at the significance value of each variable. Variables are said to significantly affect the dependent variable if the significant value is <0.05.

Odds ratio value interpretation

In the first figure above, it can be seen that SPSS provides a range of odds or odds ratio values from the lower limit (lower) to the upper limit (upper). This means that the interpretation of each variable can be done by adding up the range of odds. For example, farmers who have a larger rice field area have a chance of Y = 1 by 3,267 to 176,130 times compared to farmers who have a smaller land area.

Use these significant variables to discuss in detail by adding strong descriptions, arguments, and a bibliography so that your analysis can be accepted by readers. The odds ratio value is the core of your discussion if you aim to differentiate the characteristics of two groups or analyze the influencing factors.

Creating an equation

Equations are needed when you next want to discuss or predict an opportunity where you have obtained variable conditions. A simple example like the one above is determining whether or not someone deserves a loan. Or, you can also predict the chances of success of a program if it has conditions similar to those in the equation.

How to make an equation I have explained above, but as an illustration, I illustrate an example:

The results of the logistic regression value interpretation are as follows:

B0 = -4.2
B1 = 2.3.

The independent variable that is processed is: IP semester 1 student, with the dependent variable: 0 means graduating more than or equal to 4 years, and 1 means graduating less than 4 years.

If a student’s first semester GPA is 3, then what is the probability that the student will graduate in less than 4 years?

We determine the equation, which is:

p = e(B0+B1X) / (1+ e(B0+B1X) )
p = e(-4.2+2.3(3)) / (1+ e(-4.2+2.3(3)))
p = 0.94

The probability of that student graduating in less than 4 years is 0.94%. In the same way, students who have IP semester 1 = 2 have a chance of graduating in less than 4 years of 0.59%.

Thank you.


Leave a Reply

Your email address will not be published. Required fields are marked *