Class was taken at the University of South Florida under Professor Hagen in Fall 2017. These projects were often from "Data Science and Big Data Analytics" by EMC Education Services. Projects were assigned on a as-needed basis and will appear sporadically.
Anscombe: Shows the importance of reviewing your data. Data appears to have similar statistics but upon inspection, the data points are spread widely. This also shows that Linear Regression is not the best fit for all datasets.
Education and Household Income: Creates a scatterplot that shows the relationship between education and income. Depends on zipIncome.csv file.
Mean Household Income by Zip Code: Creates several box-and-whiskers plots that show the relationship between zip codes 0-9 by color and distribution. Depends on zipIncome.csv file.
Linear and Local Polynomial Regression: Shows ten-thousand numbers between zero and ten in a polynomial. This shows the benefits of using the LOESS function instead of the lr function.
Miles Per Gallon (MPG) of Car Models/nGrouped by Cylinder: Shows that the lower the number of cylinders in a car, the better the mpg.
Distribution of Car Cylinder Counts and Gears: Groups the cars by number of cylinders then shows the distribution of gears in that category.
Purchasing Power of the US Minimum Wage, 1938-2016: A line graph showing the the purchasing power of the US Federal minimum wage from its beginning in 1938 to 2016. Depends on "Purchasing Power 1938-2016.xlsx" file.
K means analysis of student data: Uses K-means analysis to show student's results in math, science, and english.
Grocery Store Association Rules: Uses default grocery data in 'arules' to create three different graphs. The first is top ten items, the second shows the how likely a person is to buy certain other items based on a purchase, and the last is a more refined second graph.