lois4801 / HR_Analysis--SQLtoPOWERBI

Analysis of HR Data from year 2000 to 2023 and providing valuable insights to the company

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

giphy (1)

Please support me by following me in my social media accounts, For more projects to share like this.

Github>>>>>>>https://github.com/lois4801

LinkedIn>>>>> https://www.linkedin.com/in/artemis-jay/).

HR_Analysis

This project is an Analysis of HR Data from year 2000 to 2023.

  • This project has a raw data that hasnt been cleaned yet. Rows is comprised of 22215. It could be cleaned and corrected through excel or python, but for this project I will be focusing more in SQL. Then it will be imported in PowerBI for data visualization.If I have some more time, I will also include google looker studio.

Problem Statement

This project has a raw data that hasn’t been cleaned yet. Rows is comprised of 22215. It could be cleaned and corrected through excel or python, but for this project I will be focusing more in SQL. Then it will be imported in PowerBI for data visualization. If I have some more time, I will also include google looker studio.

Here are the following questions needed to be answered ( Revised, corrected and communicated)

For Dashboard 1- Age and Gender Employee Analysis

** Ages>=18 means all employees who have hired from 18 above regardless if they are terminated or not ( termdate column is null or with value)

  • What is the gender distribution/ breakdown of employees in the company? Ages>=18
  • What is the age vs gender distribution of employees in the company?
  • How does the gender distribution across departments ?
  • How does the gender distribution across job titles?
  • Termination rate by age?

For Dashboard 2- Employee Summary1 Analysis

** Ages>=18 means all employees who have hired from 18 above regardless if they are terminated or not (( termdate column is null or with value)

  • What is the race/ethnicity breakdown of employees in the company? Ages>=18
  • How many employees work at headquarters versus remote locations?
  • What is the distribution of employees across locations by state?
  • What is the average length of employment for employees who have been terminated?
  • Which department has the highest turnover or termination rate?
  • How has the company's employee count changed over time based on hire and term dates?

For Dashboard 3- Employee Summary2 Analysis

  • What is the average length of employment for employees who have been terminated?
  • What is the average length of employment for employees who have been terminated?
  • Which department has the highest turnover rate?
  • How has the company's employee count changed over time based on hire and term dates?
  • What is the tenure distribution for each department?

Project Planning

s4


Process Map

s5


Data ETL

Problems Encountered and how I solved it.

  1. During date formatting change, I mistakenly made all the dates into null during the process. So be be sure to never update the data into the main server itself. Make a copy of csv file and do your analysis using that file.

  2. During data transformation after data analysis

  • The data under date (column), hire_date(column) has a combination of different date format

  • example 06-04-91 (month- day- year) and 6/29/1984 (month-day- year). Some data included has the right format 2012-01-01 (year- month- year)

  • I had a hard time during data transformation, errors were encountered via excel and sql because the dataset has multiple formats included in each column.

  • Importing the data first in excel and did power query to transform the data but it created error on some data. -- A friend of mine were able to transform all the data into (month / day/ year) through his developer power query ms excel software. But mine cant, so I have to think another way. Screenshot 2023-08-13 222944

    Screenshot 2023-08-13 222920

Then using sql. I made different trials before I finally cracked it.

2a. Converting birthdate and hiredate troubleshooting

This is the orignal data

image

First TRIAL

  • I keep receiving error for running this program. Incorrect value: '06-04-91' for function str_to_date. Tried tweaking some but it is just the same.

Screenshot 2023-08-13 224520

Second trial

  • Created another column and hoping to transform the data all at once. But I failed.

second trial

second trial2

Third trial

  • Tried transforming one data at a time in the same column and see if it will work. If it will work then I will try transforming the next one.

third trial

third trial2

  • Then now, the second data will be transformed.

fourth trial

fourth trial2

Final Conclusion

  • Now that both can be transformed separately succesfully. I dropped the whole hr table and imported again the raw file to try mixing both programs together and test if it can be run successfully as one program and get just one sweep result.

final trial

final trial2

  • It dit run perfectly. So now, I will be using this program to change the data for the hire_date column.

    Converted the data under hire_date column.

hire_date

hire_date2

2b. Converting the data under termdate column

Original data

  • termdate

TRIALS I MADE

  • I got mixed up at first between null, not null, not empty, empty, filled, unfilled data, and blank cells. But to get my brain wrap around it. I have troubleshooted what kind of data I want to appear.

  • In my experience, during designing the relational database it is important to know and set the critical values that you would want to appear in your database system otherwise people like me will get confuse and get a hard time cleaning and analyzing data.

  • There are many ways to solve this. It depends on your goal. I can convert first cells with data into YEAR-MONTH-DATE then fill the blank cells with null values or 0000-00-00 or just leave it blank. On my end I considered using trial 5.

  • SEE BELOW TROUBLESHOOTING I HAVE MADE

-- TRIAL 1

termdate1

  • TRIAL 2

termdate2

  • TRIAL 3

termdate3

  • TRIAL 4

    termdate4

  • TRIAL 5 ( THE ONE I CONSIDERED USING FOR NOW)

    termdate5

  • Converting the blank cells into null values

    termdate6

3. AGE ANOMALIES

  • After adding an age column and calculated the age . I was expecting age like positive values. But it seems that these unprecendented anomalies are along the way that I have to analyze and clean up as well.

1

  • I will walk you through my process of analyzing and cleaning the data

age1

age2

age3

age4

age5

age 6

age7

age8

YEAR IN COMPANY DATA ANOMALIES

y1

y2

  • It’s possible that there could be errors or inconsistencies in the data that needed to be addressed. For example, the hire date or term date for those employees might be incorrect, or there could be issues with the calculation itself. If the negative values are indeed errors, then you have a few options for how to handle them.

  • One option would be to delete those rows from my dataset, but this could potentially bias my results if those employees are not representative of the overall population.

  • Second option would be to retain those rows but exclude them from any calculations or visualizations that involve the “years in company” variable.

  • Third option is to correct the errors, if possible, by updating the hire date or term date for those employees based on the assumption that it was mistype during data entry. This will allow me to include them to the rows with ZERO YEARS IN THE COMPANY dataset later.

y3

image

PowerBI- Troubleshooting

  • Transforming my line graph into smooth curved. I found it in the following order Format Visual>> Visual>> Lines>> Linear.

s5

  • Creating my Area Chart. Choose all the variables you needed and be sure to place the year in the x-axis.
  • It should appear with a word YEAR instead of sum of year.

s6

  • I have changed the colors of the area charts to match the color scheme of my dashboard. It took me sometime to find it.

image


  • Two maps I have created didn’t match with my goal to present remote and hybrid employees per state and city.
  • Because this project computations are heavily created under the MySQL platform. I have to find a way to match my graphs and for them to communicate and interacting with each other.

HERES HOW for >>> Interactive Map and Bar chart and Pie Chart for REMOTE and HYBRID EMPLOYEE COUNT PER STATE and CITY

  • Select REMOTE VS HEADQUARTERS dataset. (From MySQL computed) and HR File( Cleaned and Transformed under MySQL). Then follow the checked marks for each file.

s8

s9

s10


PowerBI DASHBOARD RESULTS

For Dashboard 1- Age and Gender Employee Analysis

s1

For Dashboard 2- Employee Summary1 Analysis

s2

For Dashboard 3- Employee Summary2 Analysis

s3


My Biggest Takeaway from this project

-• Well defining and planning the dashboard contents during the first process is very important especially if the owners’ questions are all over the place and vague.

-• Be sure to also clarify things if the question of the owner is vague. Example is turnover or termination count, rate, and total count.

-• Be sure to also clarify or ask if some variables can be adjusted ( like using “ages>=18” instead of “ termdate is null” ) to capture all employees in the company.

-• For as long as you have now the formula for termination rate etc. You can use the cleaned and transformed whole HR file to select for certain columns (attributions like age, gender, first name etc).

-• What I did with this project>>>>computed the termination rate by department, then when I found that it is also important to include termination rate by age. I just click the age in the cleaned dataset.

-• After finalizing all the questions, I have decided to provide three dashboards for more meaningful insights for each.