DataExploration_R

P4: Explore and Summarize Data

Data Exploration is included in the HTML file here. Since it is too large to be viewed directly through GitHub, I've uploaded it to my website where it can be viewed.

Python script was adapted from the reference: https://github.com/GregorUT/vgchartzScrape

vgsales.csv was scraped from the VGChartz.com charts for regional and global video games sales (by millions of units). This data was obtained on 5/22/2017 using a Python3 script, vgchartz.py, and importing BeautifulSoup to parse out the HTML data.

After the dataset was scraped from the table on VGChartz website, it was then limited to the top 10,000 rows, and formatted using a dataframe before being output to CSV for use in R.

Scraped Data

Name (factor); Title of the video game
Platform (factor); Console/Platform game was released on
Year (num); Year game was released
Genre (factor); Genre/Category of the game title
Publisher (factor); Publisher of the video game
NA_Sales (num); Sales in millions of units in North America
EU_Sales (num); Sales in millions of units in Europe
JP_Sales (num); Sales in millions of units in Europe
Other_Sales (num); Sales in millions of units in other regions of the globe
Global_Sales (num); Total sales in units globally

Columns generated in R

Decade (factor); Decade the game was released
Franchise (factor); Name of the franchise the game is from
Company_Name (factor); Company that built the game console the game was published on

Files

VGChartzExploration.html; Report knitted from RMD file created during exploration
VGChartzExplortation.rmd; RMD file used to conduct exploration and knit file
vgsales.csv; flat file containing dataset scraped from VGChartz and explored
vgsales.py; Python script used to scrape file
VGCharts_AnalysisInR.ipnyb; Jupyter Notebook used to do the work in R before moving it to R Studio in Visual Studio
vgsales_overview.txt; text file containing information about the data set and how it was obtained
references.txt; references used during the exploration

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vs/DataExploration_R/v14		.vs/DataExploration_R/v14
Figs		Figs
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
DataExploration_R.rproj		DataExploration_R.rproj
DataExploration_R.rxproj		DataExploration_R.rxproj
DataExploration_R.sln		DataExploration_R.sln
README.md		README.md
ScrapeVGChartz.ipynb		ScrapeVGChartz.ipynb
VGCharts_AnalysisInR.ipynb		VGCharts_AnalysisInR.ipynb
VGChartzExploration.docx		VGChartzExploration.docx
VGChartzExploration.html		VGChartzExploration.html
VGChartzExploration.pdf		VGChartzExploration.pdf
VGChartzExploration.rmd		VGChartzExploration.rmd
references.txt		references.txt
vgchartz.py		vgchartz.py
vgsales.csv		vgsales.csv
vgsales_clean.csv		vgsales_clean.csv
vgsales_overview.txt		vgsales_overview.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataExploration_R

P4: Explore and Summarize Data

Scraped Data

Columns generated in R

Files

About

Releases

Packages

Languages

WhitneyOnTheWeb/DataExploration_R

Folders and files

Latest commit

History

Repository files navigation

DataExploration_R

P4: Explore and Summarize Data

Scraped Data

Columns generated in R

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages