Please support me by following me in my social media accounts, For more projects to share like this.
Github>>>>>>>https://github.com/lois4801
LinkedIn>>>>> https://www.linkedin.com/in/artemis-jay/).
- This project has a raw data that hasnt been cleaned yet. Rows is comprised of 22215. It could be cleaned and corrected through excel or python, but for this project I will be focusing more in SQL. Then it will be imported in PowerBI for data visualization.If I have some more time, I will also include google looker studio.
This project has a raw data that hasn’t been cleaned yet. Rows is comprised of 22215. It could be cleaned and corrected through excel or python, but for this project I will be focusing more in SQL. Then it will be imported in PowerBI for data visualization. If I have some more time, I will also include google looker studio.
** Ages>=18 means all employees who have hired from 18 above regardless if they are terminated or not ( termdate column is null or with value)
- What is the gender distribution/ breakdown of employees in the company? Ages>=18
- What is the age vs gender distribution of employees in the company?
- How does the gender distribution across departments ?
- How does the gender distribution across job titles?
- Termination rate by age?
** Ages>=18 means all employees who have hired from 18 above regardless if they are terminated or not (( termdate column is null or with value)
- What is the race/ethnicity breakdown of employees in the company? Ages>=18
- How many employees work at headquarters versus remote locations?
- What is the distribution of employees across locations by state?
- What is the average length of employment for employees who have been terminated?
- Which department has the highest turnover or termination rate?
- How has the company's employee count changed over time based on hire and term dates?
- What is the average length of employment for employees who have been terminated?
- What is the average length of employment for employees who have been terminated?
- Which department has the highest turnover rate?
- How has the company's employee count changed over time based on hire and term dates?
- What is the tenure distribution for each department?
-
During date formatting change, I mistakenly made all the dates into null during the process. So be be sure to never update the data into the main server itself. Make a copy of csv file and do your analysis using that file.
-
During data transformation after data analysis
-
The data under date (column), hire_date(column) has a combination of different date format
-
example 06-04-91 (month- day- year) and 6/29/1984 (month-day- year). Some data included has the right format 2012-01-01 (year- month- year)
-
I had a hard time during data transformation, errors were encountered via excel and sql because the dataset has multiple formats included in each column.
-
Importing the data first in excel and did power query to transform the data but it created error on some data. -- A friend of mine were able to transform all the data into (month / day/ year) through his developer power query ms excel software. But mine cant, so I have to think another way.
- I keep receiving error for running this program. Incorrect value: '06-04-91' for function str_to_date. Tried tweaking some but it is just the same.
- Created another column and hoping to transform the data all at once. But I failed.
- Tried transforming one data at a time in the same column and see if it will work. If it will work then I will try transforming the next one.
- Then now, the second data will be transformed.
- Now that both can be transformed separately succesfully. I dropped the whole hr table and imported again the raw file to try mixing both programs together and test if it can be run successfully as one program and get just one sweep result.
-
It dit run perfectly. So now, I will be using this program to change the data for the hire_date column.
-
I got mixed up at first between null, not null, not empty, empty, filled, unfilled data, and blank cells. But to get my brain wrap around it. I have troubleshooted what kind of data I want to appear.
-
In my experience, during designing the relational database it is important to know and set the critical values that you would want to appear in your database system otherwise people like me will get confuse and get a hard time cleaning and analyzing data.
-
There are many ways to solve this. It depends on your goal. I can convert first cells with data into YEAR-MONTH-DATE then fill the blank cells with null values or 0000-00-00 or just leave it blank. On my end I considered using trial 5.
-
SEE BELOW TROUBLESHOOTING I HAVE MADE
-- TRIAL 1
- TRIAL 2
- TRIAL 3
- After adding an age column and calculated the age . I was expecting age like positive values. But it seems that these unprecendented anomalies are along the way that I have to analyze and clean up as well.
- I will walk you through my process of analyzing and cleaning the data
YEAR IN COMPANY DATA ANOMALIES
-
It’s possible that there could be errors or inconsistencies in the data that needed to be addressed. For example, the hire date or term date for those employees might be incorrect, or there could be issues with the calculation itself. If the negative values are indeed errors, then you have a few options for how to handle them.
-
One option would be to delete those rows from my dataset, but this could potentially bias my results if those employees are not representative of the overall population.
-
Second option would be to retain those rows but exclude them from any calculations or visualizations that involve the “years in company” variable.
-
Third option is to correct the errors, if possible, by updating the hire date or term date for those employees based on the assumption that it was mistype during data entry. This will allow me to include them to the rows with ZERO YEARS IN THE COMPANY dataset later.
- Transforming my line graph into smooth curved. I found it in the following order Format Visual>> Visual>> Lines>> Linear.
- Creating my Area Chart. Choose all the variables you needed and be sure to place the year in the x-axis.
- It should appear with a word YEAR instead of sum of year.
- I have changed the colors of the area charts to match the color scheme of my dashboard. It took me sometime to find it.
- Two maps I have created didn’t match with my goal to present remote and hybrid employees per state and city.
- Because this project computations are heavily created under the MySQL platform. I have to find a way to match my graphs and for them to communicate and interacting with each other.
HERES HOW for >>> Interactive Map and Bar chart and Pie Chart for REMOTE and HYBRID EMPLOYEE COUNT PER STATE and CITY
- Select REMOTE VS HEADQUARTERS dataset. (From MySQL computed) and HR File( Cleaned and Transformed under MySQL). Then follow the checked marks for each file.
-• Well defining and planning the dashboard contents during the first process is very important especially if the owners’ questions are all over the place and vague.
-• Be sure to also clarify things if the question of the owner is vague. Example is turnover or termination count, rate, and total count.
-• Be sure to also clarify or ask if some variables can be adjusted ( like using “ages>=18” instead of “ termdate is null” ) to capture all employees in the company.
-• For as long as you have now the formula for termination rate etc. You can use the cleaned and transformed whole HR file to select for certain columns (attributions like age, gender, first name etc).
-• What I did with this project>>>>computed the termination rate by department, then when I found that it is also important to include termination rate by age. I just click the age in the cleaned dataset.
-• After finalizing all the questions, I have decided to provide three dashboards for more meaningful insights for each.