The objective of this project is to analyze the relationship between innovation and economic development using the Global Innovation Index (GII) dataset from 2013-2021, with a focus on the G20 group or specific countries and their comparison with the full cohort.
Indicator | Source | Definition |
---|---|---|
Country | G20 Members | The Group of Twenty (G20) comprises 19 countries (Argentina, Australia, Brazil, Canada, China, France, Germany, India, Indonesia, Italy, Japan, Republic of Korea, Mexico, Russia, Saudi Arabia, South Africa, Türkiye, United Kingdom and United States) and the European Union. |
Global Innovation Index (GII) | GII Indicators | A composite index that measures the innovation performance of countries based on their innovation inputs and outputs. |
Institutions | GII Indicators | The extent to which a country has effective and supportive public and private institutions for innovation, including government effectiveness, regulatory quality, rule of law, and control of corruption. |
Human Capital and Research | GII Indicators | The level of education, skills, and research activities in a country, including education expenditure, tertiary education enrollment, research and development (R&D) activities, and scientific and technical publications. |
Infrastructure | GII Indicators | The availability and quality of physical and digital infrastructure to support innovation, including information and communication technology (ICT) infrastructure, transport infrastructure, and energy infrastructure. |
Market Sophistication | GII Indicators | The degree to which a country's markets are efficient and advanced, including the ease of starting a business, the intensity of local competition, and the availability of venture capital. |
Business Sophistication | GII Indicators | The quality of business networks, collaborations, and strategies to promote innovation, including the level of enterprise creation, production process sophistication, and marketing capabilities. |
Knowledge and Technology Outputs | GII Indicators | The quantity and quality of innovations and creative outputs generated by a country, including scientific articles, patents, trademarks, and creative goods and services. |
Creative Outputs | GII Indicators | The level of cultural and creative outputs, including creative goods and services, online creativity, and national feature films. |
Measurement unit is Score (100=Max strength 0=Weakest)
@TODO
@TODO
Panel data is a collection of quantities obtained across multiple individuals, that are assembled over even intervals in time and ordered chronologically. Examples of individual groups include individual people, countries, and companies.
Panel data series modeling centers around addressing the likely dependence across data observations within the same group. In fact, the primary difference between panel data models and time series models, is that panel data models allow for heterogeneity across groups and introduce individual-specific effects.
As an example, we can consider a sample of our panel data series which includes data on innovation input sub-indices, innovation output sub-indices, and economic development variables of 10 different countries, including China, India, the United States, Japan, and Germany:
- A global pandemic, such as COVID-19, is likely to impact all 10 countries and cause changes in the innovation and economic development variables across all countries.
- A change in government policy on intellectual property rights, such as increased patent protection, may only affect certain countries, such as the United States and Japan, which have a high number of patents.
- A shift towards renewable energy sources may only impact certain countries, such as Germany and China, which have invested heavily in renewable energy innovation.
- An economic downturn in a specific region, such as Europe, may have a larger impact on certain countries within the region, such as Germany and France, which are major players in the European economy.
Panel data models include techniques that can address these heterogeneity across individuals. Furthermore, pure cross-sectional methods and pure time series models may not be valid in the presence of this heterogeneity.
As our dataset considered as a heterogeneous. We need to use a heterogeneous panel data models that allows for any or all of the model parameters to vary across individuals. In other words, we need to use a model that allows for individual-specific effects.
Individual-specific effects capture unobserved heterogeneity across individuals that may be time-invariant, and therefore cannot be captured by time-varying regressors alone. They allow for individual-level variation in the intercept, or level, of the dependent variable, which can be especially important when studying economic or social phenomena.
For example, in a panel data set that tracks the earnings of individuals over time, individual-specific effects can capture unobserved factors such as differences in innate ability, personality traits, or social networks that affect an individual's earnings potential, but remain constant over time.
To capture individual-specific effects, we can use either fixed effects or random effects models.
- Fixed effects models assume that the individual-specific effects are correlated with the independent variables. In other words, fixed effects models assume that individual-specific effects are constant over time and try to estimate how the independent variables affect changes within each individual.
- Random effects models assume that the individual-specific effects are uncorrelated with the independent variables. Instead, they estimate the variance of the individual-specific effects and use that to adjust the coefficients of the independent variables.
Panel datasets can be organized in either long or wide format.
- In long format, each variable is stacked into a single column,
- In wide format, each variable has its own column.
Long format is generally preferred in data analysis because it allows for easier data manipulation and more efficient data storage. Additionally, long format makes it easier to handle missing data and to perform statistical analyses. Therefore, we will use long format for our panel dataset in this project.