Determinant Analysis of Open Unemployment Level in Banten Province, 2018 Using Panel Data Regression

This study aims to analyse the relationship of crude school participation rate, the number of poor people, and the GDRP growth rate to the open unemployment rate in Banten. The analytical method used in this study is panel data regression with fixed effect model estimation which is processed using statistical software EViews 10. The data that are used in this study are secondary data from Badan Pusat Statistik (BPS), by taking annual data for each district in Banten province from 2011 until 2018. The results of this study indicate that the GDRP growth rate significantly affects the open unemployment rate, but the crude school participation rate and the number of poor people do not significantly affect the open unemployment rate in Banten.


INTRODUCTION Background
One of the social-demographic problems that occur in Banten is the high level of open unemployment. The open unemployment rate is the percentage of the unemployed population compared to the total labor force. When viewed from an economic standpoint, Banten has many industries, but based on BPS data in 2018 the unemployment rate in Banten is one of the highest in Indonesia, at 7.7%. While the open unemployment rate in Indonesia is only at 5.3%. The following graph shows the open unemployment rate from 2014 to 2018 in Indonesia and Banten Province.
One of the causes of high unemployment is the relatively high population and the number of poor people. Banten Province is one of the provinces in Indonesia with the highest population density, indicated by BPS data which states that in 2018 the population density in Banten is 1,313 people / km 2 , and 5,596,963 of Banten population are of the labor force. Meanwhile, the number of poor people in Banten Province as for March 2018 were 661,360 people, or as much as 5.24% of the total population of Banten.
According to Sukirno (1994), economic development in a region can improve the welfare of the people living in the region. The higher the economic growth, the higher the opportunity for companies to develop, which is then followed by the creation of employment opportunities for the community. The increase of economic growth through the GRDP will also increase the production capacity, so it is expected that more workers will be absorbed. This shows that the decline in GRDP is an indication of high unemployment in a certain area. Nowadays, not only people who have low education that could be unemployed, but also those who have high education could also be unemployed (Sukirno, 2008). On the other hand, education is positioned as a mean to improve welfare through the use of job opportunities. The ultimate goal of the education program is the expected employment opportunities. The level of education affects the work to be obtained. The higher the level of someone's education, the lower the chance for them to be unemployed.
Based on the importance of the problems related to the open unemployment rate, this study aims to determine the characteristics of the open unemployment rate in Banten Province and analyze at the influence of economic growth factors, education, and the number of poor people on the open unemployment rate in each district or city in Banten Province in 2018.

Open Unemployment
Unemployment is a condition when someone of the labor force does not have a job or is not looking for work. (Nanga, 2005: 249 Do not have a job and finding a job. b.
Do not have a job and preparing for a business. c.
Do not have a job and do not look for work, because it was impossible to get a job. d.
Already have a job, but not yet started working.
The open unemployment rate is the percentage of the number of unemployed people compared to the total labor force. The population included in the labor force is the population of working age (15 years or more) who is working, or has a job but is temporarily not working, and the unemployed.

GRDP
Economic growth can be interpreted as a change in the economic condition of a region continuously at a certain period. The rate of economic growth of a region can be calculated by the region's GRDP. Gross Regional Domestic Product at market prices is the amount of gross added value arising from all economic sectors in a region. Added value is the value added from a combination of factors of production and raw materials during the production process.
An area is said to experience economic growth if there is an increase in GDP in the region. Economic growth indicates that economic development in the region is successful.

Education
The quality of human resources is one of the determining factors in the economic conditions of a region. One indicator that determines the quality of human resources is education. Education is an effort to improve competence with the aim that individuals can adjust to the changes and challenges they face. Education should also provide provisions and abilities to do certain types of work so that individuals can participate in development (Boediono, 1992). Educational programs must be implemented in accordance with needs that lead to anticipation of various changes that occur both now and in the future (Han, 1994;Dertouzas, Lester, and Solow, 1989).

Poor People
The residents of a region are defined as the people living in the region or people who legally have the right to live in the region. Indonesian citizens are native Indonesian citizens and foreign nationals residing in Indonesia (UUD 1945 Article 26 verse 2). According to dukcapil, a resident is someone who has a KTP and/or KK.
To measure poverty, BPS uses the concept of the ability to meet basic needs (basic needs approach). Using this approach, poverty is seen as an economic inability to meet basic food and non-food needs that are measured in terms of expenditure. So, the poor population are those who have an average per capita expenditure per month below the poverty line.

a. Model Estimation of Panel Data Regression
Panel data regression analysis is a data analysis method that combines cross section and time series data. The panel data regression model estimation is as follows: where: The equation of the multiple linear regression model consists of several independent variables and one dependent variable. In the panel data regression, model that is to be estimated requires assumptions about intercepts, slopes, and errors. According to Widarjono (2007, 251), to estimate model parameters with panel data, there are three estimation techniques, namely:

Common Effect Model
This technique is the simplest technique for estimating panel data parameters, it is done by combining cross section and time series data as a single unit without seeing the differences in time and entities (individuals). The approach often used is the Ordinary Least Square (OLS) method. The Common Effect Model assumes that data behavior between individuals is the same in various periods of time.

Fixed Effect Model
The Fixed Effect model approach assumes that the intercepts of each individual are different while the slopes between individuals are the same. This technique uses dummy variables to capture the existence of intercept differences for each individual.

Random Effect Model
The approach used in the Random Effect assumes that each individual has different intercepts and errors, those intercepts are random or stochastic variables. This model is used if the individual (entity) taken as a sample is chosen randomly and is representative of the population. This technique also takes into account that errors may be correlated throughout the cross section and time series.
The three panel data estimation models can be selected according to the circumstances of the study, consider from the number of individuals and variables studied. However, there are ways that can be used as a basis for determining which technique is most appropriate in estimating the panel data parameters. According to Widarjono (2007: 258), there are three tests that need to be done to choose panel data estimation techniques. First, the F statistical test is used to choose between the common effect method and the fixed effect method. Second, the Hausman test is used to choose between the fixed effect method and the random effect method. Third, the Lagrange Multiplier (LM) test is used to choose between the common effect method and the random effect method.
According to Nachrowi (2006: 318), the choice between the fixed effect method and the random effect method can be made by considering the purpose of the analysis, or if the data used in the model can only be processed by one method. In Eviews software, the random effect method can only be used when the number of individuals is greater than the number of coefficients including intercepts. According to some Econometrics experts, if the panel data has a greater number/periods of time (t) than the number of individuals (i), then the fixed effect method is more suitable for use. Whereas if the panel data has a smaller periods of time (t) than the number of individuals (i), it is recommended to use the random effect method instead.

b.
Selection of Panel Data Regression Model

F-Test (Chow Test)
Chow test is done by adding a dummy variable to determine the existence of different intercepts so that it can be tested with a statistical test F. The aim is to find out whether panel data regression technique using fixed effect method (by adding dummy variables) is better than panel data regression without dummy variables (common effect method).
Hypothesis in the Chow test: If the F-Test value is greater than the F value from the F table, the decision rejects H 0 , which means the right model to use is the fixed effect model. Conversely, if the F-Tets value is smaller than F from the table, then we fail to reject H 0 , which means the right model to use is the common effect model.

Hausman Test
Hausman developed a test with the conditions in the Chow test proving that the fixed effect model is more suitable for use. This advanced test is used to choose whether the fixed effect method or the random effect method is better to be used.
The hypothesis for the Hausman test can be written as follows: H 0 : (random effect model is better) Hausman test statistics follow the Chi-Square statistical distribution with degrees of freedom of the number of independent variables. If the statistical value of the Hausman test results is greater than the Chi-Square value from the table, the decision rejects H 0 , which means that the right model to use is the fixed effect model. Conversely, if the statistical value of the Hausman test results is smaller than the Chi-Square value in the table, then we fail to reject H 0 , which means the right model to use is the Random Effect model.

3.
Lagrange Multiplier Test According to Widarjono (2007: 260), to find out which of the random effect model or the common effect model is better to use, a Lagrange Multiplier (LM) test must be performed.
The hypothesis for the LM test can be written as follows: H 0 : (common effect model is more suitable) H 1 : (random effect model is more suitable) The LM test follows the Chi-Square distribution with degrees of freedom of the number of independent variables. If the calculated LM value is greater than the Chi-Square value from the table, then the decision rejects H 0 , which means the model that is more suitable is the random effect model. Conversely, if the calculated LM value is smaller than the Chi-Square from table value, then we fail to reject H 0 , which means the more appropriate model to use is the common effect model.

c.
Classic Assumption Test

Normality
According to Gujarati (2008), the classic linear regression model assumes that u i is normally distributed when: Average :

Variance :
Cov(u i ,u j ) : Those assumptions can be stated as follows: The OLS estimator remains BLUE even though is not normally distributed. With a normal assumption, the regression coefficient of the OLS estimator follows the normal distribution so that the t and F tests can be used to test statistical hypotheses regardless of the sample size. However, if is not normally distributed, the t and F tests are still asymptotically valid in large samples, but not in small samples. If the JB value is smaller than the table value at a certain level of significance (α) then it fails to reject H0, so it can be concluded that the error is normally distributed.

Multicollinearity
Multicollinearity is a condition when there is a strong relationship between independent variables, it is much avoided because it will affect the accuracy of the estimated parameters in estimating the true value.
Chatterjee and Price explained in Nachrowi (2002), the existence of multicollinearity causes the interpretation of regression coefficients to be no longer accurate. Correlation between independent variables is still allowed as long as the correlation is not so high or not close to 1 (near perfect).
To identify the presence of multicollinearity, there are several ways that can be done, among the various ways, the easiest is to find the value of the correlation coefficient between free variables with Pearson correlation coefficient: Where X and Y are the independent variables you want to find the correlation coefficient value and n is the amount of data from the X and Y variables. The absolute value of the correlation coefficient has a range of values from zero to one. The closer to one, the stronger the relationship between the two variables, so that the greater the possibility for multicollinearity.

Data Source
All the data used in this study is secondary data, obtained from the publication of BPS Banten. The data that are used are open unemployment rate data and data about the influencing factors covering 8 regencies/cities in Banten from 2011 to 2018. In order to support the research process, we used a statistical software, EViews 10.

Research Variables
The dependent variable in this study is the level of open unemployment from 2011 to 2018 in Banten excluding the data in 2016. The independent variables are as follows: 1. Gross Participation Rate (X1) (X2) 3. Growth Rate of Gross Regional Domestic Product (GRDP) at Constant Price (X3)

Variables Operational Definition
This study consists of two types of variables, namely dependent and independent variables. Then the operational definition of this research variables can be described as follows: The Gross Participation Rate is a comparison between the number of students at a certain level of education at a certain age and the number of population of the same age presented in percentages.

Kemdikbud 3 Total Poor Population (X 2 )
The number of population who have an average per capita expenditure per month below the poverty line.
The changes in economic conditions with increasing production capacity in a region continuously in a certain period.

Analysis Steps
In this study an analysis was conducted to obtain the best model with the following steps: 1. Determine panel data that will be examined. The Jarque-Bera Test (JB) is conducted to determine data normality ˗ If H 0 is failed to be rejected, then the residuals follow the normal distribution.
˗ If H 0 is rejected, then the residual does not follow the normal distribution. So it is necessary to transform the data into log/ln format.

b) Heteroscedasticity Test
Perform a Lagrange Multiplier Test (LM Test) to determine the existence of heteroscedasticity.
˗ If H 0 is failed to be rejected, then there is no heteroscedasticity in the model, or the model is homoscedastic.
˗ If H 0 is rejected, then there are signs of heteroscedasticity in the model. So it requires to perform the estimation procedure using the weight: Cross Section Weight method.
c) Nonautocorrelation Test d) Multicollinearity Test 6. Form the best panel data regression model based on model verification.

Case Analysis with Panel Data Regression
To estimate the panel data regression model on Banten Province's open unemployment rate data in 2011-2018, the Chow test is first performed to determine the initial allegations on the model.  Based on the Chow test output, it is known that the F value obtained is 6.871249 where the probability value is smaller than α = 0.05. Thus, with a significance level of 5%, there is enough evidence to state that the fixed effect model is better than the common effect model.
The Chow test shows that the more suitable model is the fixed effect model, so Hausman test needs to be conducted to determine the better estimation model to be used between the fixed effect and the random effect model.

Statistic test
The statistic test used is the Wald test: Where:  Based on the Hausman test output, it is known that the obtained chi-squared statistic value is 7.834238, where the probability value is smaller than α = 0.05. Thus, with a significance level of 5%, there is enough evidence to state that the fixed effect model is better than the random effect model. Thus, after the Chow test and Hausman test obtained the model used is the fixed effect model.

Testing the Classical Assumptions of Panel Data Regression
The assumption tests conducted in this study are normality and multicollinearity tests. The results of the normality test, obtained from Jaque-Bera statistical value is 1.5794 with a probability of 0.45 (greater than α = 0.05). So it can be concluded that the residuals are normally distributed. Then in the multicollinearity test obtained the coefficient of regression between X variables looks significant and the result of R 2 value is less than 0.80, it can be concluded that there is no multicollinearity among fellow variable X.

Panel Data Regression Estimation
The following is the estimated parameter results from the fixed effect model.

Tabel 4. Fixed Effect Model Estimation Results
From the estimation results, obtained Adjusted R-squared value of 0.479696 which means that the independent variables in the model can explain the dependent variable by 47.96% and the rest is explained by other variables.
The F-statistic value is 6.070738 and has a prob F-statistics) or p-value less than α = 5%. This number indicates the rejection of H 0 , so it can be concluded that there is at least one explanatory variable that significantly influences the level of open unemployment in Banten province in 2011-2018.
The equation produced from the table above is as follows: Where:

TPT it
: Open unemployment rate in regency/city i year t APK it : Gross participation rate in regency/city i year t JPM it : Total poor population in regency/city i year t GRDP it : growth rate in regency/city i year t μ i : Cross-sectional effect of regency/city i Then the partial testing can be obtained from the t-statistic value and the probability value by using a significance level of 5%. Based on the results obtained, the explanatory variables that partially have a significant effect on the open unemployment rate variable are the GRDP growth rate. Whereas the Gross Participation Rate and the total poor popuation do not significantly influence the open unemployment rate.
The variable GRDP growth rate has a significant positive effect. The coefficient value of the GRDP variable is 0.837799. This shows that each increase in GRDP growth rate of 1% will be followed by an increase in the open unemployment rate of 0.837799 assuming other variables are constant. This illustrates the economic situation in the province of Banten which continues to increase so that attracts migrants to enter Banten province. But in doing so it makes competition for employment even more difficult in Banten, so that makes more Banten native residents are not absorbed as workers. Therefore, the open unemployment rate in Banten province will be in line with the increasing rate of growth in Banten province's GRDP.