# Load the SQL Alchemy Python library
-import sqlalchemy
-import pandas as pd
First, let’s load in the basic_examples.db
database.
%load_ext sql
%load_ext sql
The sql extension is already loaded. To reload it, use:
- %reload_ext sql
+import duckdb
+= duckdb.connect()
+ conn "INSTALL sqlite") conn.query(
%%sql
-///data/basic_examples.db sqlite:
%sql duckdb:///data/basic_examples.db --alias basic
Aggregating with GROUP BY
At this point, we’ve seen that SQL offers much of the same functionality that was given to us by pandas
. We can extract data from a table, filter it, and reorder it to suit our needs.
In pandas
, much of our analysis work relied heavily on being able to use .groupby()
to aggregate across the rows of our dataset. SQL’s answer to this task is the (very conveniently named) GROUP BY
clause. While the outputs of GROUP BY
are similar to those of .groupby()
—— in both cases, we obtain an output table where some column has been used for grouping —— the syntax and logic used to group data in SQL are fairly different to the pandas
implementation.
To illustrate GROUP BY
, we will consider the Dish
table from the basic_examples.db
database.
%%sql
-*
- SELECT ; FROM Dish
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-name | -type | -cost | -
---|---|---|
ravioli | -entree | -10 | -
ramen | -entree | -13 | -
taco | -entree | -7 | -
edamame | -appetizer | -4 | -
fries | -appetizer | -4 | -
potsticker | -appetizer | -4 | -
ice cream | -dessert | -5 | -
Say we wanted to find the total costs of dishes of a certain type
. To accomplish this, we would write the following code.
%%sql
-type, SUM(cost)
- SELECT
- FROM Dishtype; GROUP BY
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-type | -SUM(cost) | -
---|---|
appetizer | -12 | -
dessert | -5 | -
entree | -30 | -
To illustrate GROUP BY
, we will consider the Dish
table from our database.
%%sql
+*
+ SELECT ; FROM Dish
Notice that there are multiple dishes of the same type
. What if we wanted to find the total costs of dishes of a certain type
? To accomplish this, we would write the following code.
%%sql
+type, SUM(cost)
+ SELECT
+ FROM Dishtype; GROUP BY
What is going on here? The statement GROUP BY type
tells SQL to group the data based on the value contained in the type
column (whether a record is an appetizer, entree, or dessert). SUM(cost)
sums up the costs of dishes in each type
and displays the result in the output table.
You may be wondering: why does SUM(cost)
come before the command to GROUP BY type
? Don’t we need to form groups before we can count the number of entries in each? Remember that SQL is a declarative programming language —— a SQL programmer simply states what end result they would like to see, and leaves the task of figuring out how to obtain this result to SQL itself. This means that SQL queries sometimes don’t follow what a reader sees as a “logical” sequence of thought. Instead, SQL requires that we follow its set order of operations when constructing queries. So long as we follow this order, SQL will handle the underlying logic.
Aggregating with
AVG
: find the average value of each groupWe can easily compute multiple aggregations all at once (a task that was very tricky in pandas
).
%%sql
-type, SUM(cost), MIN(cost), MAX(name)
- SELECT
- FROM Dishtype; GROUP BY
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-type | -SUM(cost) | -MIN(cost) | -MAX(name) | -
---|---|---|---|
appetizer | -12 | -4 | -potsticker | -
dessert | -5 | -5 | -ice cream | -
entree | -30 | -7 | -taco | -
%%sql
+type, SUM(cost), MIN(cost), MAX(name)
+ SELECT
+ FROM Dishtype; GROUP BY
To count the number of rows associated with each group, we use the COUNT
keyword. Calling COUNT(*)
will compute the total number of rows in each group, including rows with null values. Its pandas
equivalent is .groupby().size()
.
Recall the Dragon
table from the previous lecture:
%%sql
-* FROM Dragon; SELECT
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-name | -year | -cute | -
---|---|---|
hiccup | -2010 | -10 | -
drogon | -2011 | --100 | -
dragon 2 | -2019 | -0 | -
Notice that COUNT(*)
and COUNT(cute)
result in different outputs/
%%sql
-*)
- SELECT year, COUNT(
- FROM Dragon; GROUP BY year
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-year | -COUNT(*) | -
---|---|
2010 | -1 | -
2011 | -1 | -
2019 | -1 | -
%%sql
-
- SELECT year, COUNT(cute)
- FROM Dragon; GROUP BY year
* sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-year | -COUNT(cute) | -
---|---|
2010 | -1 | -
2011 | -1 | -
2019 | -1 | -
%%sql
+* FROM Dragon; SELECT
Notice that COUNT(*)
and COUNT(cute)
result in different outputs.
%%sql
+*)
+ SELECT year, COUNT(
+ FROM Dragon; GROUP BY year
%%sql
+
+ SELECT year, COUNT(cute)
+ FROM Dragon; GROUP BY year
With this definition of GROUP BY
in hand, let’s update our SQL order of operations. Remember: every SQL query must list clauses in this order.
SELECT <column expression list>
@@ -441,97 +215,31 @@ Aggregating with
[OFFSET <number of rows>];
Note that we can use the AS
keyword to rename columns during the selection process and that column expressions may include aggregation functions (MAX
, MIN
, etc.).
Summary
-Let’s summarize what we’ve learned so far. We know that SELECT
and FROM
are the fundamental building blocks of any SQL query. We can augment these two keywords with additional clauses to refine the data in our output table.
Any clauses that we include must follow a strict ordering within the query:
-SELECT <column list>
+
+Filtering Groups
+Now, what if we only want groups that meet a certain condition? HAVING
filters groups by applying some condition across all rows in each group. We interpret it as a way to keep only the groups HAVING
some condition. Note the difference between WHERE
and HAVING
: we use WHERE
to filter rows, whereas we use HAVING
to filter groups. WHERE
precedes HAVING
in terms of how SQL executes a query.
+Let’s take a look at the Dish
table to see how we can use HAVING
. Say we want to group dishes with a cost greater than 4 by type
and only keep groups where the max cost is less than 10.
+
+%%sql
+type, COUNT(*)
+ SELECT
+ FROM Dish> 4
+ WHERE cost type
+ GROUP BY < 10; HAVING MAX(cost)
+
+Here, we first use WHERE
to filter for rows with a cost greater than 4. We then group our values by type
before applying the HAVING
operator. With HAVING
, we can filter our groups based on if the max cost is less than 10.
+
+
+Summary: SQL
+With this definition of GROUP BY
and HAVING
in hand, let’s update our SQL order of operations. Remember: every SQL query must list clauses in this order.
+SELECT <column expression list>
FROM <table>
[WHERE <predicate>]
[GROUP BY <column list>]
[ORDER BY <column list>]
[LIMIT <number of rows>]
-[OFFSET <number of rows>]
-Here, any clause contained in square brackets [ ]
is optional —— we only need to use the keyword if it is relevant to the table operation we want to perform. Also note that by convention, we use all caps for keywords in SQL statements and use newlines to make code more readable.
-
-
-Filtering Groups
-HAVING
filters groups by applying some condition across all rows in each group. We interpret it as a a way to keep only the groups HAVING
some condition. Note the difference between WHERE
and HAVING
: we use WHERE
to filter rows, whereas we use HAVING
to filter groups. WHERE
precedes HAVING
in terms of how SQL executes a query.
-Let’s take a look at the Dish
table to see how we can use HAVING
.
-The code below groups the different dishes by type, and only keeps those groups wherein the max cost is still less than 8.
-
-%%sql
-type, COUNT(*)
- SELECT
- FROM Dishtype
- GROUP BY < 8; HAVING MAX(cost)
-
- * sqlite:///data/basic_examples.db
- sqlite:///data/imdbmini.db
-Done.
-
-
-
-
-
-
-type
-COUNT(*)
-
-
-
-
-appetizer
-3
-
-
-dessert
-1
-
-
-
-
-
-
-In contrast, the code below first filters for rows where the cost is less than 8, and then does the grouping. Note the difference in outputs - in this case, “taco” is also included, whereas other entries in the same group having cost greater than or equal to 8 are not included.
-
-%%sql
-type, COUNT(*)
- SELECT
- FROM Dish< 8
- WHERE cost type; GROUP BY
-
- * sqlite:///data/basic_examples.db
-Done.
-
-
-
-
-
-
-type
-COUNT(*)
-
-
-
-
-appetizer
-3
-
-
-dessert
-1
-
-
-entree
-1
-
-
-
-
-
-
-In general, to filter rows, we use WHERE
, while to filter groups, we use HAVING
. Note that WHERE
precedes HAVING
when they are both used.
+[OFFSET <number of rows>];
+Note that we can use the AS
keyword to rename columns during the selection process and that column expressions may include aggregation functions (MAX
, MIN
, etc.).
EDA in SQL
@@ -539,253 +247,80 @@EDA in SQL
Our typical workflow when working with “big data” is:
- Use SQL to query data from a database -
- Use
python
(withpandas
) to analyze this data in detail
+ - Use Python (with
pandas
) to analyze this data in detail
We can, however, still perform simple data cleaning and re-structuring using SQL directly. To do so, we’ll use the Title
table from the imdbmini
database.
We can, however, still perform simple data cleaning and re-structuring using SQL directly. To do so, we’ll use the Title
table from the imdb_duck
database, which contains information about movies and actors.
Let’s load in the imdb_duck
database.
import os
+if os.path.exists("/home/jovyan/shared/sql/imdb_duck.db"):
+= "duckdb:////home/jovyan/shared/sql/imdb_duck.db"
+ imdbpath elif os.path.exists("data/imdb_duck.db"):
+= "duckdb:///data/imdb_duck.db"
+ imdbpath else:
+import gdown
+ = 'https://drive.google.com/uc?id=10tKOHGLt9QoOgq5Ii-FhxpB9lDSQgl1O'
+ url = 'data/imdb_duck.db'
+ output_path =False)
+ gdown.download(url, output_path, quiet= "duckdb:///data/imdb_duck.db"
+ imdbpath print(imdbpath)
from sqlalchemy import create_engine
+= create_engine(imdbpath, connect_args={'read_only': True})
+ imdb_engine %sql imdb_engine --alias imdb
Since we’ll be working with the Title
table, let’s take a quick look at what it contains.
%%sql imdb
+
+ *
+ SELECT
+ FROM Title'Ginny & Georgia', 'What If...?', 'Succession', 'Veep', 'Tenet')
+ WHERE primaryTitle IN (10; LIMIT
Matching Text using LIKE
One common task we encountered in our first look at EDA was needing to match string data. For example, we might want to remove entries beginning with the same prefix as part of the data cleaning process.
In SQL, we use the LIKE
operator to (you guessed it) look for strings that are like a given string pattern.
%%sql
-///data/imdbmini.db sqlite:
%%sql
-
- SELECT titleType, primaryTitle
- FROM Title"Star Wars: Episode I - The Phantom Menace" WHERE primaryTitle LIKE
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-titleType | -primaryTitle | -
---|---|
movie | -Star Wars: Episode I - The Phantom Menace | -
What if we wanted to find all Star Wars movies? %
is the wildcard operator, it means “look for any character, any number of times”. This makes it helpful for identifying strings that are similar to our desired pattern, even when we don’t know the full text of what we aim to extract. In contrast, _
means “look for exactly 1 character”, as you can see in the Harry Potter example that follows.
%%sql
-
- SELECT titleType, primaryTitle
- FROM Title"%Star Wars%"
- WHERE primaryTitle LIKE 10; LIMIT
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-titleType | -primaryTitle | -
---|---|
movie | -Star Wars: Episode IV - A New Hope | -
movie | -Star Wars: Episode V - The Empire Strikes Back | -
movie | -Star Wars: Episode VI - Return of the Jedi | -
movie | -Star Wars: Episode I - The Phantom Menace | -
movie | -Star Wars: Episode II - Attack of the Clones | -
movie | -Star Wars: Episode III - Revenge of the Sith | -
tvSeries | -Star Wars: Clone Wars | -
tvSeries | -Star Wars: The Clone Wars | -
movie | -Star Wars: The Clone Wars | -
movie | -Star Wars: Episode VII - The Force Awakens | -
%%sql
-
- SELECT titleType, primaryTitle
- FROM Title"Harry Potter and the Deathly Hallows: Part _" WHERE primaryTitle LIKE
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-titleType | -primaryTitle | -
---|---|
movie | -Harry Potter and the Deathly Hallows: Part 1 | -
movie | -Harry Potter and the Deathly Hallows: Part 2 | -
%%sql
+
+ SELECT titleType, primaryTitle
+ FROM Title'Star Wars: Episode I - The Phantom Menace' WHERE primaryTitle LIKE
What if we wanted to find all Star Wars movies? %
is the wildcard operator, it means “look for any character, any number of times”. This makes it helpful for identifying strings that are similar to our desired pattern, even when we don’t know the full text of what we aim to extract.
%%sql
+
+ SELECT titleType, primaryTitle
+ FROM Title'%Star Wars%'
+ WHERE primaryTitle LIKE 10; LIMIT
Alternatively, we can use RegEx! DuckDB and most real DBMSs allow for this. Note that here, we have to use the SIMILAR TO
operater rather than LIKE
.
%%sql
+
+ SELECT titleType, primaryTitle
+ FROM Title'.*Star Wars*.'
+ WHERE primaryTitle SIMILAR TO 10; LIMIT
CAST
ing Data Types
A common data cleaning task is converting data to the correct variable type. The CAST
keyword is used to generate a new output column. Each entry in this output column is the result of converting the data in an existing column to a new data type. For example, we may wish to convert numeric data stored as a string to an integer.
%%sql
-
- SELECT primaryTitle, CAST(runtimeMinutes AS INT), CAST(startYear AS INT)
- FROM Title5 LIMIT
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-primaryTitle | -CAST(runtimeMinutes AS INT) | -CAST(startYear AS INT) | -
---|---|---|
A Trip to the Moon | -13 | -1902 | -
The Birth of a Nation | -195 | -1915 | -
The Cabinet of Dr. Caligari | -76 | -1920 | -
The Kid | -68 | -1921 | -
Nosferatu | -94 | -1922 | -
%%sql
+
+ SELECT primaryTitle, CAST(runtimeMinutes AS INT); FROM Title
We use CAST
when SELECT
ing colunns for our output table. In the example above, we want to SELECT
the columns of integer year and runtime data that is created by the CAST
.
SQL will automatically name a new column according to the command used to SELECT
it, which can lead to unwieldy column names. We can rename the CAST
ed column using the AS
keyword.
%%sql
-
- SELECT primaryTitle AS title, CAST(runtimeMinutes AS INT) AS minutes, CAST(startYear AS INT) AS year
- FROM Title5; LIMIT
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-title | -minutes | -year | -
---|---|---|
A Trip to the Moon | -13 | -1902 | -
The Birth of a Nation | -195 | -1915 | -
The Cabinet of Dr. Caligari | -76 | -1920 | -
The Kid | -68 | -1921 | -
Nosferatu | -94 | -1922 | -
%%sql
+
+ SELECT primaryTitle AS title, CAST(runtimeMinutes AS INT) AS minutes, CAST(startYear AS INT) AS year
+ FROM Title5; LIMIT
Usi ... ELSE <yet another value> END -
Scanning through the skeleton code above, you can see that the logic is similar to that of an if
statement in python
. The conditional statement is first opened by calling CASE
. Each new condition is specified by WHEN
, with THEN
indicating what value should be filled if the condition is met. ELSE
specifies the value that should be filled if no other conditions are met. Lastly, END
indicates the end of the conditional statement; once END
has been called, SQL will continue evaluating the query as usual.
Scanning through the skeleton code above, you can see that the logic is similar to that of an if
statement in Python. The conditional statement is first opened by calling CASE
. Each new condition is specified by WHEN
, with THEN
indicating what value should be filled if the condition is met. ELSE
specifies the value that should be filled if no other conditions are met. Lastly, END
indicates the end of the conditional statement; once END
has been called, SQL will continue evaluating the query as usual.
Let’s see this in action. In the example below, we give the new column created by the CASE
statement the name movie_age
.
%%sql
-/* If a movie was filmed before 1950, it is "old"
-if a movie was filmed before 2000, it is "mid-aged"
- Otherwise, is "new" */
- Else, a movie
-
- SELECT titleType, startYear,< 1950 THEN "old"
- CASE WHEN startYear < 2000 THEN "mid-aged"
- WHEN startYear "new"
- ELSE
- END AS movie_age
- FROM Title10; LIMIT
sqlite:///data/basic_examples.db
- * sqlite:///data/imdbmini.db
-Done.
-titleType | -startYear | -movie_age | -
---|---|---|
short | -1902 | -old | -
movie | -1915 | -old | -
movie | -1920 | -old | -
movie | -1921 | -old | -
movie | -1922 | -old | -
movie | -1924 | -old | -
movie | -1925 | -old | -
movie | -1925 | -old | -
movie | -1927 | -old | -
movie | -1926 | -old | -
%%sql
+/* If a movie was filmed before 1950, it is "old"
+if a movie was filmed before 2000, it is "mid-aged"
+ Otherwise, is "new" */
+ Else, a movie
+
+ SELECT titleType, startYear,< 1950 THEN 'old'
+ CASE WHEN startYear < 2000 THEN 'mid-aged'
+ WHEN startYear 'new'
+ ELSE
+ END AS movie_age; FROM Title
JOIN
ing Tables
-At this point, we’re well-versed in using SQL as a tool to clean, manipulate, and transform data in a table. Notice that this sentence referred to one table, specifically. What happens if the data we need is distributed across multiple tables? This is an important consideration when using SQL – recall that we first introduced SQL as a language to query from databases. Databases often store data in a multidimensional structure. In other words, information is stored across several tables, with each table containing a small subset of all the data housed by the database.
-A common way of organizing a database is by using a star schema. A star schema is composed of two types of tables. A fact table is the central table of the database – it contains the information needed to link entries across several dimension tables, which contain more detailed information about the data.
+At this point, we’re well-versed in using SQL as a tool to clean, manipulate, and transform data in a table. Notice that this sentence referred to one table, specifically. What happens if the data we need is distributed across multiple tables? This is an important consideration when using SQL —— recall that we first introduced SQL as a language to query from databases. Databases often store data in a multidimensional structure. In other words, information is stored across several tables, with each table containing a small subset of all the data housed by the database.
+A common way of organizing a database is by using a star schema. A star schema is composed of two types of tables. A fact table is the central table of the database —— it contains the information needed to link entries across several dimension tables, which contain more detailed information about the data.
Say we were working with a database about boba offerings in Berkeley. The dimension tables of the database might contain information about tea varieties and boba toppings. The fact table would be used to link this information across the various dimension tables.
JOIN
ing Tables
JOIN table_2
ON key_1 = key_2;
We also need to specify what column from each table should be used to determine matching entries. By defining these keys, we provide SQL with the information it needs to pair rows of data together.
-In a cross join, all possible combinations of rows appear in the output table, regardless of whether or not rows share a matching key. Because all rows are joined, even if there is no matching key, it is not necessary to specify what keys to consider in an ON
statement. A cross join is also known as a cartesian product.
The most commonly used type of SQL JOIN
is the inner join. It turns out you’re already familiar with what an inner join does, and how it works – this is the type of join we’ve been using in pandas
all along! In an inner join, we combine every row in our first table with its matching entry in the second table. If a row from either table does not have a match in the other table, it is omitted from the output.
Another way of interpreting the inner join: perform a cross join, then remove all rows that do not share a matching key. Notice that the output of the inner join above contains all rows of the cross join example that contain a single color across the entire row.
-In a full outer join, all rows that have a match between the two tables are joined together. If a row has no match in the second table, then the values of the columns for that second table are filled with null. In other words, a full outer join performs an inner join while still keeping rows that have no match in the other table. This is best understood visually:
+In a cross join, all possible combinations of rows appear in the output table, regardless of whether or not rows share a matching key. Because all rows are joined, even if there is no matching key, it is not necessary to specify what keys to consider in an ON
statement. A cross join is also known as a cartesian product.
We have kept the same output achieved using an inner join, with the addition of partially null rows for entries in s
and t
that had no match in the second table. Note that FULL OUTER JOIN
is not supported by SQLite, the “flavor” of SQL that will be used in lab and homework.
A left outer join is similar to a full outer join. In a left outer join, all rows in the left table are kept in the output table. If a row in the right table shares a match with the left table, this row will be kept; otherwise, the rows in the right table are omitted from the output.
+Conceptually, we can interpret an inner join as a cross join, followed by removing all rows that do not share a matching key. Notice that the output of the inner join above contains all rows of the cross join example that contain a single color across the entire row.
+In a left outer join, all rows in the left table are kept in the output table. If a row in the right table shares a match with the left table, this row will be kept; otherwise, the rows in the right table are omitted from the output. We can fill in any missing values with NULL
.
A right outer join keeps all rows in the right table. Rows in the left table are only kept if they share a match in the right table. Right outer joins are not supported by SQLite.
+A right outer join keeps all rows in the right table. Rows in the left table are only kept if they share a match in the right table. Again, we can fill in any missing values with NULL
.
In the examples above, we performed our joins by checking for equality between the two tables (i.e., by setting s.id = t.id
). SQL also supports joining rows on inequalities, which is something we weren’t able to do when working in pandas
. Consider a new dataset that contains information about students and teachers.
In a full outer join, all rows that have a match between the two tables are joined together. If a row has no match in the second table, then the values of the columns for that second table are filled with NULL
. In other words, a full outer join performs an inner join while still keeping rows that have no match in the other table. This is best understood visually:
Often, we wish to compare the relative values of rows in different tables, rather than check that they are exactly equal. For example, we may want to join rows where students are older than the corresponding teacher. We can do so by specifying an inequality in our ON
statement.
We have kept the same output achieved using an inner join, with the addition of partially null rows for entries in s
and t
that had no match in the second table.
Aliasing in JOIN
s
+When joining tables, we often create aliases for table names (similarly to what we did with column names in the last lecture). We do this as it is typically easier to refer to aliases, especially when we are working with long table names. We can even reference columns using aliased table names.
+Let’s say we want to determine the average rating of various movies:
+%%sql
+
+
+ SELECT primaryTitle, averageRating
+ FROM Title AS T INNER JOIN Rating AS R= R.tconst; ON T.tconst
Note that the AS
is actually optional! We can create aliases for our tables even without it, but we usually include it for clarity.
%%sql
+
+
+ SELECT primaryTitle, averageRating
+ FROM Title T INNER JOIN Rating R= R.tconst; ON T.tconst
Common Table Expression
+For more sophisticated data problems, the queries can become very complex. Common Table Expressions allow us to break down these complex queries into more manageable parts. This involves creating temporary tables which correspond to different aspects of the problem and then referencing them in the final query. The following format is an example of how we can create two temporary tables and then use them for further querying:
+WITH
+table_name1 AS (
+ SELECT ...
+),
+table_name2 AS (
+ SELECT ...
+)
+SELECT ...
+FROM
+table_name1,
+table_name2, ...
+Let’s say we want to identify the top 10 action movies that are highly rated (with an average rating greater than 7) and popular (having more than 5000 votes), along with the primary actors who are the most popular, we can use Common Table Expression to break this query down into separate problems. Initially, we can filter to find good action movies and prolific actors separately. This way, in our final join, we only need to change the order.
+%%sql
+
+ WITH
+ good_action_movies AS (*
+ SELECT = R.tconst
+ FROM Title T JOIN Rating R ON T.tconst '%Action%' AND averageRating > 7 AND numVotes > 5000
+ WHERE genres LIKE
+ ),
+ prolific_actors AS (*) as numRoles
+ SELECT N.nconst, primaryName, COUNT(= P.nconst
+ FROM Name N JOIN Principal P ON N.nconst = 'actor'
+ WHERE category
+ GROUP BY N.nconst, primaryName
+ )
+ SELECT primaryTitle, primaryName, numRoles, ROUND(averageRating) AS rating
+ FROM good_action_movies m, prolific_actors a, principal p= m.tconst AND p.nconst = a.nconst
+ WHERE p.tconst
+ ORDER BY rating DESC, numRoles DESC10; LIMIT