For those of you who like football, of course you are familiar with the term ball statistics. It contains club matches for both big clubs and small clubs. These ball statistics are usually used to predict club wins in previous matches. For example, there will be a match between Barcelona and Real Sociedad. Statistics or history show that Barcelona always wins against Real Sociedad, either at home or away. No wonder many are favoring Barcelona’s victory in this match. If there is a match between Barcelona and Real Madrid, maybe the chances of the football audience being divided equally between Barcelona and Real Madrid are different. This is because both teams are equally strong and have a balanced history.
The story above is a simple phenomenon if history or past data can be used as a basis for us to determine what will happen in future matches, or what will happen in the future. A simple story but widely felt by football lovers to show how data can help us determine attitudes in the future.
In everyday life, it is not only the world of football that uses historical data or time series to do forecasting. BMKG data, for example, determines the rainy season from October to April. This is not only determined by wind direction and sea level climate, but also based on time series data from the previous 10 or 20 years which shows that October to April usually rains.
Another example is the number of visitors in a mall. There are certain hours of the day when the mall is always crowded, usually after work until late at night. If traced more generally, certain months such as before Eid or Christmas, malls are usually very crowded and even out of stock.
So, what is the problem?
Forecasting generated by time series usually also affects the decisions of individuals, institutions, and agencies. For example, the rainy season occurs in October through April. The DKI Jakarta government, of course, even though it has to fix the river before that time so that flooding can be reduced or eliminated, Farmers also determine when to plant when the rainy season starts. The Ministry of Agriculture issued a planting calendar application that can guide farmers in cultivating land and commodities based on rainfall and climate predictions as well as land conditions.
The second example is about malls. The production company must calculate how much to supply in the month before the holiday, how much to produce before the holiday, and how much reserve goods must be provided before the holiday arrives. All of this is based on the time series forecast figures used by the company so that the profit obtained can be maximized and the company does not experience lost purchases due to a lack of goods in the mall. If the company loses purchases due to out-of-stock products, it becomes fatal because consumers can switch to competitors’ products and have the opportunity to lose consumers if it turns out that these consumers like competitors’ products.
I am very familiar with this supply chain problem because I used to work for a supply chain and production planning company.
Given the importance of planning and forecasting data in the future, one of these quantitative sciences, business forecasting, is one of the courses that must be mastered by prospective businesspeople and decision-makers.
Definition of Time Series
Time series is data that is collected and observed over a period of time. There are four elements in time series data; namely trend, seasonality, cycle, and random or random component. Trend patterns are usually seen from graphs that go up or down over a long period of time (10 years, 20 years, 15 years, 5 years). While seasonal data usually rises and falls in the short term, for example one year. This is what distinguishes it from cycles, cycles also show patterns that go up and down but over a long period of time. The last component is random, which is another variable that cannot be explained by the three previous components (random).
Time Series Technique
The time series technique uses historical data to forecast the next set of data. Similar to regression, Y is the history data, and X is the period or time data itself, which can be 1 for the earliest data, 2 for the next data, and so on. The resulting model will be used to forecast the next Y value. Then does it use r-squared? The answer is yes, although time series in measuring its accuracy does not use R-squared to measure their accuracy. Because time series also include equation models, R-squared should also be used to assess whether the resulting equation is good or not.
Various time series techniques are as follows:
The Naive technique is the simplest time series technique. A brief description of this new technique is that we predict with data from one previous period. For example, if March sales were 20 units, then we predict April sales will also be 20 units. Or, in terms of years, if sales in 2010 were 200 units, then we predict sales in 2011 will also be 200 units.
The equation can be written as follows:
Y prediksi = Y t-1
The moving average (MA) technique is a development of the Naive technique. If Nave only uses data from one previous period to determine or forecast future data, then MA uses several periods of data and then averages them to determine the next data. The amount of data used is usually called the order. It is said to be a moving average because the average used seems to move depending on the data you want to predict. Confused? Consider the following illustrative example:
1 = A
2 = B
3 = C
4 = D
5 = E
To determine the sixth data, let’s say we use MA (2), or MA of order 2, then:
Ypred4 = average(B,C)
Ypred5 = average(C,D)
Ypred6 = average(D,E)
Notice that the average moves with the prediction Y. If the order used is 3, then the amount of data averaged is 3 before the prediction Y. Got it?
The MA technique can then be developed into a double MA technique. That is, the MA result of the moving average of the actual value is done by MA again, or doing the moving average twice.
Trend technique is a commonly used technique in quantitative data forecasting analysis. Basically, we look for trend patterns in the data we have; for example, linear, quadratic, S curve, or exponential; then we use the model to estimate the next data.
Linear model: Ypred = a + bT + e, Quadratic model: Ypred = a + bT2 + cT + e, S curve model: Ypred = L/(1+exp(a+b(T) + e), Exponential model: Ypred = a + eb.T
You do not need to worry about the many options used to perform data forecasting techniques because you can use software to help process data.
Which technique is the best?
We can say a forecasting model is best if it has the smallest error criterion. So, the model gets Ypred, which is then compared to Yact, and then the error value is calculated. Some techniques for calculating the error value are
Mean Absolute Error (MAE) or Mean Absolute Deviation (MAD)
Based on the name, it is the average value of the absolute value of the error. Or it can be written with the equation
Mean Squared Error (MSE) or Mean Squared Deviation (MSD)
It is the average value of the squared error, or can be written with the equation:
Mean Percentage Error (MAPE)
Is the average of the percentage error to the actual value
Right off the bat, let’s continue with the software practice. I usually analyze trends using Minitab. This is not a promotion, but I think this application is lighter to determine the trend model we want to use.
I have 11 consecutive years of corn production data.
Then click stat – time series – trend analysis
Enter the corn variable, then select the model type. This time I used linear first in the model type. You can set the output that will come out in the minitab, such as graphs, on the graphs button. But I usually don’t do it until I know which model is most appropriate. Then click OK.
Then a graph appears with linear and Yact model line information. Here are the MSD, MAPE and MAD data.
I did the same thing above to get the quadratic trend, exponential growth, and S-curve models. The results I obtained are as follows:
For the s-curve, Minitab immediately told me that the data above was not suitable for the S-curve model.
I then put the MSD MAD and MAPE data together and determine which value is the lowest.
From these models, it can be seen that the quadratic model is the best model of the trend technique to describe the value of corn production. So what if we try moving averages? We can do another comparison.
Still in minitab, click stat – time series – moving average
Fill in the variable, MA length is the order of the MA, we fill in the value 3 for example. You can try filling in 2 or other values. This time I’ll try directly with the value of 3, because this article is getting long. Minitab also gives the option of moving average in the form of center MA or not. The explanation I used in the above section is not using center MA. Center MA is an MA technique with Ypred value obtained by averaging with Ypred position in the middle. This means that if MA (3) means the average of Yt-1, Yt, and Yt+2. In this exercise, I do not use Center MA. Click OK
The results I obtained
Then I compare the error value with the previous models.
It turns out that the quadratic trend is still appropriate for forecasting data.
So in conclusion, we will use the quadratic model to forecast the data. How to do it, we click stat – time series – trend analysis. Then we select the quadratic model. We click generate forecast. We fill in the forecast number 5, starting from the 11th data (this is an example, you can fill it according to the research objectives). Then don’t forget we want to determine the ypred value or in Minitab known as Fits and residuals to calculate R squared later. Click storage and check fits, forecast and residuals. Click ok click Ok
The results obtained are like this:
It can be seen that forecasting for data 12 to 16. In column C4 or in the minitab session.
As I promised earlier, we can also find R-squared to find out how good the model we use is like we assess the regression model.
The R-squared formula is: 1 – (JKS/JKT)
JKS or the sum of squares of the residuals = (Yact – Ypred)2
JKT or total sum of squares = (Yact – Ymean)2,
Ymean is the average value of Yact
I process the Fits and residual values from minitab, transfer them to excel and calculate the R-squared value. I obtained the following results:
The r-squared result is 76%, meaning that the quadratic model is able to explain the Y value by 76%, while the rest is the error value. This model category can already be said to be good to be used to forecast the next data.
There are still some other forecasting techniques, at first I wanted to explain them all in one article, but it doesn’t seem possible. Hopefully I can write the next article about smoothing, and Arima Sarima, also still a quantitative data forecasting technique.