Skip to content

Commit

Permalink
updating the draft of TDM 20100 Project 13
Browse files Browse the repository at this point in the history
  • Loading branch information
mdw333 committed Nov 14, 2024
1 parent 0969147 commit d63296d
Showing 1 changed file with 27 additions and 25 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,10 @@ plot(myDF$myyears, myDF$myruns)

Using the `seminar-r` kernel in Jupyter Lab, open a connection to the Lahman database using the `dbConnect` process that is outlined above.

Revisit your work from Project 8, Question 4, but this time, make a `dotchart`, as follows:
Revisit your work from Project 8, Question 4, using the Lahman baseball database, but this time, make a `dotchart`, as follows:

Use the Batting table to find the top 5 players of all time, in terms of their total number of hits, in other words, according to SUM(H). Instead of printing the output, this time make a dotchart with 5 rows. Each row should show the `playerID` of each player and the total number of hits in each of their careers.


.Deliverables
====
Make a dotchart with 5 rows for the top 5 players of all time, in terms of their total number of hits, `SUM(H)`. Each row should show the `playerID` of each player and the total number of hits in each of their careers.
Expand All @@ -92,70 +91,73 @@ Make a dotchart with 5 rows for the top 5 players of all time, in terms of their

=== Question 2 (2 pts)

[NOTE]
====
Back in the regular Jupyter Lab notebook, using the `seminar` kernel, you can load the database that you created like this:
Revisit your work from Project 8, Question 5, using the Lahman baseball database, but this time, make a `dotchart`, as follows:

`%sql sqlite:////anvil/scratch/x-mdw/newflightdatabase.db`
but (of course) change the `mdw` to your ACCESS username.
====
Consider the Schools table, group together the schools in each state. Find the number of schools in each group, using `SELECT COUNT(*) as mycounts, state` so that you see how many schools are in each state, and the state abbreviation too. Order your results according to the values of mycounts in descending order (which is denoted by DESC), in other words, the states with the most schools are printed first in your list.

Join the `flights` and the `airports` table, matching the `Origin` column to the `iata` column. Find the total number of flights in the database for each `Origin` airport that is located in Texas. For each `Origin` airport in Texas, print the total number of flights and the 3-letter `Origin` airport code.
In this way, by using LIMIT 5, you can make a dotchart that displays the 5 states with the most schools, and the number of schools in each state.

.Deliverables
====
- For each `Origin` airport in Texas, print the total number of flights and the 3-letter `Origin` airport code.
Make a dotchart that displays the 5 states with the most schools, and the number of schools in each state.
====



=== Question 3 (2 pts)

a. From the `flights` table, find the 10 most popular `TailNum` values, according to how many times that each `TailNum` appears in the `flights` table. For each of these top 10 `TailNum`, list the `TailNum` and the number of flights on that `TailNum`.
Revisit your work from Project 11, Question 2, using the IMDB Movies database, but this time, make a `dotchart`, as follows:

Join the ratings and the basics table, to find the 13 titles that each have more than 2 million ratings. Make a dotchart for these 13 titles, showing the `primaryTitle` and the number of ratings for each of these 13 titles.


b. Notice that the 5 most popular `TailNum` values are: (blank), UNKNOW, 0, NKNO, 000000. Ignoring these top 5 most popular values, in part b, we want you to consider (only) the 6th most popular `TailNum` value, which should be `N525`. You can read about this 6th most popular airplane here: https://www.flightaware.com/live/flight/N525 For *only* this 6th most popular airplane, with `TailNum` equal to `N525`, please make a separate query of the `flights` table that shows the top 5 `Origin` airports for this plane's flights. (Hint: This airplane has departed 2952 times from Dallas Love Field `DAL` and also 2146 times from Phoenix's Sky Harbor International Airport `PHX`.)

.Deliverables
====
- For each of these top 10 `TailNum`, list the `TailNum` and the number of flights on that `TailNum`.
- After identifying the 6th most popular airplane (from part a; which is the first *valid* airplane; it should have `tailnum` equal to `N525`), now find the top 5 `Origin` airports for this specific plane's flights. For each of these top 5 `Origin` airports for this plane, find the three-letter code of the `Origin` airport and the number of times that this specific airplane departed from each such `Origin`.
Make a dotchart for these 13 titles, showing the `primaryTitle` and the number of ratings for each of these 13 titles.
====


=== Question 4 (2 pts)

Now let's revisit question 3, but this time we will JOIN the `flights` table and the `planes` table ON the `TailNum` value. Group the results according to the `TailNum` and find the 10 most popular values, listing the `TailNum` value and the number of flights for each such `TailNum`.
Revisit your work from Project 11, Question 3, using the IMDB Movies database, but this time, make a `plot`, as follows:

a. Using the startYear values from the basics table, find the total number of entries in each startYear. Make a plot that shows the `startYear` on the x-axis and the number of entries from each `startYear` on the y-axis.

b. Now fix your plot from part (a), so that you only show the results in which `myDF$startYear > 0`.


[NOTE]
====
Notice that the invalid tail numbers from question 3 are gone (because they do not appear in the `planes` table) and also the `TailNum` that you discovered in question 3 is gone too (because it does not appear in the `planes` table either). Hint: The top `TailNum` for this question is `N908DE` which had `25050` flights altogether.
====

.Deliverables
====
- JOIN the `flights` table and the `planes` table, to find the 10 most popular values, listing the `TailNum` value and the number of flights for each such `TailNum`.
- Make a plot that shows the `startYear` on the x-axis and the number of entries from each `startYear` on the y-axis.
- Now fix your plot from part (a), so that you only show the results in which `myDF$startYear > 0`.
====


=== Question 5 (2 pts)

Join the `flights` and the `carriers` table, matching the `UniqueCarrier` column to the `Code` column. Find the total number of flights in the database for each `UniqueCarrier`. For each `UniqueCarrier`, print the `UniqueCarrier` value, the `Description` value, and also the total number of flights for that `UniqueCarrier`. (Hint: Your query results should have 29 rows altogether.)
Revisit your work from Project 12, Question 2, using the flights database that you built, but this time, make a `dotchart`, as follows:

Join the `flights` and the `airports` table, matching the `Origin` column to the `iata` column. Find the total number of flights in the database for each `Origin` airport that is located in Texas. Make a dotchart that shows, for each `Origin` airport in Texas, the total number of flights and the 3-letter `Origin` airport code.



.Deliverables
====
- For each `UniqueCarrier`, print the `UniqueCarrier` value, the `Description` value, and also the total number of flights for that `UniqueCarrier`.
Make a dotchart that shows, for each `Origin` airport in Texas, the total number of flights and the 3-letter `Origin` airport code.
====


== Submitting your Work

We have now built on the same skills that we learned for the movies database and the baseball database, but this time, we developed our own database of airplane flights and answered questions about the database that we built!
Now we known how to leverage our knowledge of SQL when working in R!



.Items to submit
====
- firstname-lastname-project12.ipynb
- firstname-lastname-project13.ipynb
====

[WARNING]
Expand Down

0 comments on commit d63296d

Please sign in to comment.