Data Exploration is included in the HTML file here. Since it is too large to be viewed directly through GitHub, I've uploaded it to my website where it can be viewed.
Python script was adapted from the reference: https://github.com/GregorUT/vgchartzScrape
vgsales.csv was scraped from the VGChartz.com charts for regional and global video games sales (by millions of units). This data was obtained on 5/22/2017 using a Python3 script, vgchartz.py, and importing BeautifulSoup to parse out the HTML data.
After the dataset was scraped from the table on VGChartz website, it was then limited to the top 10,000 rows, and formatted using a dataframe before being output to CSV for use in R.
- Name (factor); Title of the video game
- Platform (factor); Console/Platform game was released on
- Year (num); Year game was released
- Genre (factor); Genre/Category of the game title
- Publisher (factor); Publisher of the video game
- NA_Sales (num); Sales in millions of units in North America
- EU_Sales (num); Sales in millions of units in Europe
- JP_Sales (num); Sales in millions of units in Europe
- Other_Sales (num); Sales in millions of units in other regions of the globe
- Global_Sales (num); Total sales in units globally
- Decade (factor); Decade the game was released
- Franchise (factor); Name of the franchise the game is from
- Company_Name (factor); Company that built the game console the game was published on
- VGChartzExploration.html; Report knitted from RMD file created during exploration
- VGChartzExplortation.rmd; RMD file used to conduct exploration and knit file
- vgsales.csv; flat file containing dataset scraped from VGChartz and explored
- vgsales.py; Python script used to scrape file
- VGCharts_AnalysisInR.ipnyb; Jupyter Notebook used to do the work in R before moving it to R Studio in Visual Studio
- vgsales_overview.txt; text file containing information about the data set and how it was obtained
- references.txt; references used during the exploration