The Project will tackle the following questions regarding the number of PhD students in Germany.
- How do the number of students change over the last 4 years?
- How does this change based on nationality, gender and type of courses?
Being a Masters in Data Science student myself, I was interested in the statistics of higher education students. Analysing this data will provide us with several insighs which we can use to dive deeper and research questions such as:
- Which courses do foreigners usually opt for and how to market them better.
- Which courses are male/female dominant.
- Why are some courses more popular than the others? Do we need better marketing? or is it a language barrier issue?
Download the repository and have ipython as well as matplotlib and pandas installed. Test it using Python 3.11.3 - I do not give gurantees that it works with older versions.
Copy the ipyton/ Jupyter notebook and open on your preferable editor. Run the notebook to view results.
Program files are stored in the src folder. Further documentation can be found in the description.txt
The Dataset is from Genesis, a statistical data service provided by the German government. Dataset used is GENESIS-Tabelle: 21352-0003, Statistics of doctoral students. I have created 2 csv for this: The first has course groups only while the 2nd has all the courses.
- Data is recorded from 2019 - 2022.
- Main header has Year, Course Group/ Course, German, Foreigner and Total Columns. Where German, Foreigner and Total refer to the number of students.
- 2nd header has Male and Female column for the German, Foreigner and Total Columns.