Skip to content

chesterheng/sql-database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Complete SQL + Databases Bootcamp: Zero to Mastery [2020]

Table of Contents

Section 1: Introduction

4. The Big Data Breach At Keiko Corp

  • Somebody stole our trade secrets!

⬆ back to top

Section 2: History And Story Of Data

5. What Is a Database?

It's a collection of data, a method for accessing and manipulating that data.

Data are collected by many sources

  • Drone
  • Mobile App
  • Web App
  • Videocam
  • Automobile

⬆ back to top

6. I Didn't Learn Anything, Try Again...

Why is a database always represented with a cylinder?

Data are collected and use by different people

  • Product Managers: learn from data to improve the product
  • Marketers: analyse business decision
  • Web Developers
  • Data Analysts: analyse and build machine learning model
  • Data Engineer
  • Data Administrator

⬆ back to top

7. Database Management System (DBMS)

  • How to put data in DB
  • How to use/update/learn from data
  • How to remove data

Confusing Acronyms

  • DBMS: Database Management System
  • RDBMS: Relational Database Management System
  • SQL: Structured Query Language

Confusing Anagrams

⬆ back to top

8. Exercise: Building Amazon

SQL Playground

⬆ back to top

9. Exercise: Building Amazon 2

⬆ back to top

10. 5 Types Of Databases

7 Database Paradigms

5 data models

  • Relational: support asset transactions
  • Document: data is in a document, scalability
  • Key Value: simple way to access data
  • Graph: good for connected data
  • Wide Columnar

⬆ back to top

11. Exercise: What Is A Database?

  • A database is a hardware and software system that allows a user to store, organize and use data.
  • A database is a hardware and software system stores data and answer questions.

⬆ back to top

12. Section Review

  • Do you have useful data?
  • Install a DB to Store Data
  • Use SQL to Setup + Use Data

⬆ back to top

Section 3: Databases + SQL Fundamentals

13. SQL Playground

DB Fiddle - SQL Database Playground

⬆ back to top

14. Exercise: Setting Up Your First Database

CREATE TABLE Class (
  id character(255),
  year integer NOT NULL,
  PRIMARY KEY(id)
);

CREATE TABLE Student (
  id varchar(255) NOT NULL,
  class varchar(255) NOT NULL,
  firstName varchar(255) NOT NULL,
  lastName varchar(255) NOT NULL,
  dob date NOT NULL,
  sex varchar(1) NOT NULL,
  FOREIGN KEY (class) REFERENCES Class(id),
  PRIMARY KEY (id)
);

INSERT INTO Class(id, year) VALUES('c1', 1);
INSERT INTO Class(id, year) VALUES('c2', 2);
INSERT INTO Class(id, year) VALUES('c2a', 2);

INSERT INTO Student VALUES ('s1', 'c1', 'George', 'Jacobson', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s2', 'c2', 'Macy', 'Waterson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s3', 'c1', 'Bill', 'Peters', '1992-01-01', 'm');
INSERT INTO Student VALUES ('s4', 'c1', 'Janine', 'Wilson', '1992-01-01', 'f');
INSERT INTO Student VALUES ('s5', 'c2', 'Jason', 'Lipton', '1992-01-01', 'm');

⬆ back to top

15. What Is SQL?

  • SQL is programming language to talk to database
  • Database is a structured set of data

⬆ back to top

16. What Is A Query?

  • Query is question written in SQL statement

CREATE TABLE User (
  id varchar(255) NOT NULL,
  name varchar(255) NOT NULL,
  lastName varchar(255) NOT NULL,
  dob date NOT NULL,
  sex varchar(1) NOT NULL,
  role varchar(255) NOT NULL,
  PRIMARY KEY (id)
);

INSERT INTO User VALUES ('u1', 'George', 'Jacobson', '1992-01-01', 'm', 'manager');
INSERT INTO User VALUES ('u2', 'Macy', 'Waterson', '1992-01-01', 'f', 'employee');
INSERT INTO User VALUES ('u3', 'Bill', 'Peters', '1992-01-01', 'm', 'employee');
INSERT INTO User VALUES ('u4','Janine', 'Wilson', '1992-01-01', 'f', 'manager');
INSERT INTO User VALUES ('u5', 'Jason', 'Lipton', '1992-01-01', 'm', 'manager');

SELECT * 
FROM User
WHERE role = 'employee';

⬆ back to top

17. Imperative vs Declarative

  • Declarative: What will happen?
  • Imperative: How it will happen?

⬆ back to top

18. History of SQL

⬆ back to top

19. Exercises: The Select Statement

Exercises: The Select Statement

-- Insert the missing statement to get all the columns from the Customers table.
SELECT * 
FROM Customers;

-- Write a statement that will select the City column from the Customers table.
SELECT City
FROM Customers;

-- Select all the different values from the Country column in the Customers table.
SELECT DISTINCT Country 
FROM Customers;

⬆ back to top

20. Optional: History of SQL Deep Dive

History of Databases

⬆ back to top

21. SQL Standards

The SQL Standard – ISO/IEC 9075:2016 (ANSI X3.135)

⬆ back to top

22. What Is A Database? Revisited

File processing systems

  • save data on individual files
  • no correlation between the files
  • no relationship between the data
  • update in one file is not reflected in data of relationed files
  • no common standards per system
  • no common models per data
  • data are isolated and duplicated across systems

Model: structure of data

⬆ back to top

23. Database Oriented Approach

⬆ back to top

24. Exercise: SQL Starter Quiz

SQL Starter Quiz

⬆ back to top

25. Database Models

How you organize and store your data ?

  • Hierarchical
  • Networking
  • Entity-Relationship
  • Relational
  • Object Oriented
  • Flat
  • Semi-Structured

⬆ back to top

26. Hierarchical And Networking Model

Hierarchical Model

  • each child can have only one parent
  • child is tightly couple with parent
  • represent 1 to many relationship

Networking Model

  • represent many to many relationship

⬆ back to top

27. Relational Model

⬆ back to top

28. DBMS Revisited

DBMS features

  • App <-> SQL <-> DBMS
  • CRUD operations
  • Manage your data
  • Secure your data
  • Transactional management

Codd's Rules for RDBMS

⬆ back to top

29. Relational Model Revisited

⬆ back to top

30. Tables

⬆ back to top

31. Columns

⬆ back to top

32. Rows

⬆ back to top

33. Primary And Foreign Keys

⬆ back to top

34. OLTP vs OLAP

  • OLTP: Support day to day

  • OLAP: Support analysis

⬆ back to top

35. Exercise: OLTP vs OLAP

  • A database is being used to log orders and customers: OLTP
  • A database is being used to figure out what new products we should offer: OLAP
  • A database is being used to derive statistics for reporting to the executives: OLAP
  • A database is being used to keep track of logged in users: OLTP

⬆ back to top

36. Exercise: Relational Model Quiz

Exercise: Relational Model Quiz

⬆ back to top

Section 4: Environment Setup

38. Why PostgreSQL

PostgreSQL is the worlds’ best database

⬆ back to top

39. Environment Tooling

⬆ back to top

40. SQL Tooling Alternatives

our choice was based on multiple criteria

  • Is it cross-platform?
  • Is it easy to use?
  • Is it free?
  • Can it be extended to multiple database providers?

Alternatives

⬆ back to top

41. Command Line 101

Command Line 101

⬆ back to top

45. MAC Setup

⬆ back to top

46. MAC Commandline tools

CLI Tools

⬆ back to top

49. Importing Data On MAC

psql -U chesterheng -d Employees < employees.sql
psql -U chesterheng -d World < world.sql
psql -U chesterheng -d Store < store.sql
psql -U chesterheng -d France < france.sql

⬆ back to top

Section 5: SQL Deep Dive

52. Starting With Query

  • DCL: Data control language
  • DDL: Data definition language
  • DQL: Data query language
  • DML: Data modification language

What is a select? Retrieve data

⬆ back to top

53. Exercise: Simple Queries

SELECT * FROM "public"."employees";
SELECT * FROM "public"."departments";

⬆ back to top

54. Exercise: Renaming Columns

SELECT emp_no AS "Employee #",
  birth_date AS "Birthday",
  first_name AS "First Name"
FROM "public"."employees";

⬆ back to top

55. Concat Function

  • Make your data prettier/simpler
SELECT emp_no,
  CONCAT(first_name, ' ', last_name) AS "full name"
FROM "public"."employees";

⬆ back to top

56. What Is A Function In SQL?

  • input -> function -> output
  • a function is a set of steps that creates a single value

Types of functions

  • Aggregate: operate on many records to produce 1 value
  • Aggregate: produce a summary
  • Scalar: operate on each record independently

⬆ back to top

57. Aggregate Functions

Aggregate Functions

  • AVG()
  • COUNT
  • MIN()
  • MAX()
  • SUM()
SELECT COUNT(emp_no) FROM "public"."employees";
SELECT MAX(salary) FROM "public"."salaries";
SELECT SUM(salary) FROM "public"."salaries";

⬆ back to top

58. Exercise: Aggregate Functions

--
/*
 * What database should I use for these exercises?
 * Name: Employees
 */
-- 
-- Question 1: What is the average salary for the company?
-- Table: Salaries
-- Result: 63810.744836143706
select avg(salary) from salaries;

-- Question 2: What year was the youngest person born in the company?
-- Table: employees
select max(birth_date) from employees;

--
/*
 * What database should I use for these exercises?
 * Name: France
 */
-- 
-- Question 1: How many towns are there in france?
-- Table: Towns
-- Result: 36684
select count(id) from towns;

--
/*
 * What database should I use for these exercises?
 * Name: World
 */
-- 
-- Question 1: How many official languages are there?
-- Table: countrylanguage
-- Result: 238
select count(countrycode) from countrylanguage
where isofficial = true;

-- Question 2: What is the average life expectancy in the world?
-- Table: country
-- Result: 66.48603611164265
select avg(lifeexpectancy) from country;

-- Question 3: What is the average population for cities in the netherlands?
-- Table: city
-- Result: 185001
select AVG(population) from city
where countrycode = 'NLD';

⬆ back to top

59. Commenting Your Queries

How to make comments the most important ‘code’ you write

⬆ back to top

60. Common SELECT Mistakes

Ten Common SQL Programming Mistakes

  • Misspell commands
  • Use ; instead of , and vice versa
  • Use " instead of '
  • " is for tables
  • ' is for text
  • Invalid column name

⬆ back to top

61. Filtering Data

SELECT first_name
FROM "public"."employees"
WHERE gender = 'F';

⬆ back to top

62. AND and OR

SELECT first_name, last_name, hire_date
FROM "public"."employees"
WHERE first_name = 'Georgi' 
  AND last_name = 'Facello'
  OR first_name = 'Bezalel' 
  AND last_name = 'Simmel';

⬆ back to top

63. Exercise: Filtering Data

SELECT firstname, lastname, gender, state 
FROM customers
WHERE gender = 'F' AND (state = 'OR' OR state = 'NY') ;

⬆ back to top

64. Exercise: The Where Clause

-- Select all records where the City column has the value "Berlin".
SELECT * 
FROM Customers
WHERE City = 'Berlin';

-- Use the NOT keyword to select all records where City is NOT "Berlin".
SELECT * 
FROM Customers
WHERE NOT City = 'Berlin';

-- Select all records where the CustomerID column has the value 32.
SELECT * 
FROM Customers
WHERE CustomerID = 32 ;

-- Select all records where the City column has the value 'Berlin' and the PostalCode column has the value 12209.
SELECT * 
FROM Customers
WHERE City = 'Berlin'
  AND PostalCode = 12209;

-- Select all records where the City column has the value 'Berlin' or 'London'.
SELECT * FROM Customers
WHERE City = 'Berlin'
  OR City = 'London';

⬆ back to top

65. The NOT Keyword

What if you want to filter on everything but ... ?

SELECT age FROM customers WHERE NOT age = 55;
SELECT COUNT(age) FROM customers WHERE NOT age = 55;

⬆ back to top

66. Comparison Operators

Comparison Functions and Operators

⬆ back to top

67. Exercise: Comparison Operators

-- How many female customers do we have from the state of Oregon (OR)?
-- Result: 106
SELECT COUNT(firstName)
FROM customers
WHERE gender = 'F' and state = 'OR';

-- Who over the age of 44 has an income of 100 000 or more? (excluding 44)
-- Result: 2497
SELECT COUNT(income)
FROM customers
WHERE age > 44 and income >= 100000;

-- Who between the ages of 30 and 50 has an income less than 50 000?
-- (include 30 and 50 in the results)
-- Result: 2362
SELECT COUNT(income)
FROM customers
WHERE age >= 30 and age <= 50 AND income < 50000;

-- What is the average income between the ages of 20 and 50? (Excluding 20 and 50)
-- Result: 59409.926240780098
SELECT AVG(income)
FROM customers
WHERE age > 20 and age < 50;

⬆ back to top

68. Logical Operators

Logical Operators

  • AND
  • OR
  • NOT

Order of operations

  • FROM
  • WHERE
  • SELECT

⬆ back to top

69. Operator Precedence

A statement having multiple operators is evaluated based on the priority of operators

⬆ back to top

71. Exercise: Operator Precedence

/*
* DB: Store
* Table: Customers
* Question: 
* Select people either under 30 or over 50 with an income above 50000
* Include people that are 50
* that are from either Japan or Australia
*/
SELECT firstname, income, age 
FROM customers
WHERE income > 50000 
  AND (age < 30 OR age >= 50)
  AND (country = 'Japan' OR country = 'Australia')

/*
* DB: Store
* Table: Orders
* Question: 
* What was our total sales in June of 2004 for orders over 100 dollars?
*/
SELECT SUM(totalamount) 
FROM orders
WHERE (orderdate >= '2004-06-01' AND orderdate <= '2004-06-30') 
  AND totalamount > 100

⬆ back to top

72. Checking For NULL Values

  • When a record does not hava a value, it is considered empty
  • NULL is used to represent a missing/empty value
  • Operate with NULL will get NULL, eg. SELECT NULL = NULL
  • Best practice is to avoid NULL

⬆ back to top

73. IS Keyword

Database Design Follies: NULL vs. NOT NULL

Should you use NULL?

  • Optional or required?
  • Future info?
  • Retional? What impact?

Be defensive

  • Always check for NULL when necessary
  • Filter out NULL with IS operator
  • Clean up your data

In database table creation, do you allow NULL or use a default value to represent empty data?

  • Use default value: cannot filter NULL data
  • Use default value: cannot distinguish user entered or default

⬆ back to top

74. NULL Coalescing

  • Replace NULL values to operate on the data
  • Return the first non-null value in a list
SELECT COALESCE(<column>, 'Empty')
FROM <table>

⬆ back to top

75. Exercise: Null Value Coalescing

CREATE TABLE "Student" (
	id serial PRIMARY KEY,
  name varchar(255),
  lastName varchar(255),
  age int
);

INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 1', NULL, NULL);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 2', NULL, 25);
INSERT INTO "Student" (name, lastName, age) VALUES (null, 'LAST NAME 3', 20);
INSERT INTO "Student" (name, lastName, age) VALUES (null, null, 34);
INSERT INTO "Student" (name, lastName, age) VALUES ('STUDENT 5', NULL, 58);

-- Assuming a students minimum age for the class is 15, what is the average age of a student?
SELECT avg(coalesce(age, 15)) FROM "Student";

-- Replace all empty first or last names with a default?
SELECT id, coalesce(name, 'fallback'), coalesce(name, 'lastName'), age FROM "Student";

⬆ back to top

76. 3 Valued Logic

  • TRUE | NULL | FALSE
  • NULL value can be anything
  • NULL is always unknown
  • NULL = NULL (NULL)
  • IS NULL (TRUE)

⬆ back to top

77. Exercise: 3 Valued Logic

/*
* DB: Store
* Table: customers
* Question: adjust the following query to display the null values as "No Address"
*/
SELECT COALESCE(address2, 'No Address')
FROM customers

/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/

SELECT *
FROM customers
WHERE address2 IS NOT null;

/*
* DB: Store
* Table: customers
* Question: Fix the following query to apply proper 3VL
*/

SELECT coalesce(lastName, 'Empty'), * from customers
where (age IS NULL);

⬆ back to top

78. BETWEEN + AND

  • Shorthand to match against a range of value
  • More readability and maintainability
WHERE <column> >= X AND <column> <= Y
WHERE <column> BETWEEN X AND Y

⬆ back to top

79. IN Keyword

  • Filter multiple values
  • Check if a value matched any values in a list
WHERE <column> IN (value1, value2, value3)
WHERE <column> = value1 
  OR <column> = value2 
  OR <column> = value3

⬆ back to top

80. Exercise: IN Keyword

/*
* DB: Store
* Table: orders
* Question: How many orders were made by customer 7888, 1082, 12808, 9623
*/

SELECT COUNT(orderid)
FROM orders
WHERE customerid IN (7888, 1082, 12808, 9623)


/*
* DB: World
* Table: city
* Question: How many cities are in the district of Zuid-Holland, Noord-Brabant and Utrecht?
*/

SELECT COUNT(id)
FROM city
WHERE district IN ('Zuid-Holland', 'Noord-Brabant', 'Utrecht');

⬆ back to top

81. LIKE

  • partial lookups
  • What if you don't know exactly what you are searching for?
  • You need to build patterns to match when use LIKE
  • postgres LIKE only does text comparison, so must cast whatever we use to text

Pattern wildcards

  • %: match any number of characters
  • -: match 1 character

⬆ back to top

82. Exercise: Like Keyword

/*
* DB: Employees
* Table: employees
* Question: Find the age of all employees who's name starts with M.
* Sample output: https://imgur.com/vXs4093
*/

SELECT emp_no, first_name, EXTRACT (YEAR FROM AGE(birth_date)) as "age" FROM employees
WHERE first_name ILIKE 'M%';

/*
* DB: Employees
* Table: employees
* Question: How many people's name start with A and end with R?
* Expected output: 1846
*/

SELECT count(emp_no) FROM employees
WHERE first_name ILIKE 'A%R';
                                                  
/*
* DB: Store
* Table: customers
* Question: How many people's zipcode have a 2 in it?.
* Expected output: 4211 
*/

SELECT count(customerid) FROM customers
WHERE zip::text LIKE '%2%';


/*
* DB: Store
* Table: customers
* Question: How many people's zipcode start with 2 with the 3rd character being a 1.
* Expected output: 109 
*/

SELECT count(customerid) FROM customers
WHERE zip::text LIKE '2_1%';

/*
* DB: Store
* Table: customers
* Question: Which states have phone numbers starting with 302?
* Replace null values with "No State"                                                  
* Expected output: https://imgur.com/AVe6G4c
*/

SELECT coalesce(state, 'No State') as "State" FROM customers
WHERE phone::text LIKE '302%';

⬆ back to top

83. Dates And Timezones

UTC is enough for everyone...right?

  • Dates are hard
  • What is a timezone? set by Sir Sanford Fleming
  • Greenwich Mean Time (GMT): a time zone
  • Universal Coordinate Time (UTC): a time standard
  • No territories use UTC
  • GMT and UTC share the same current time
SHOW TIMEZONE;
SET TIME ZONE 'UTC';

⬆ back to top

84. Setting Up Timezones

ALTER USER chesterheng SET timezone='UTC';
SHOW TIMEZONE;

⬆ back to top

85. How Do We Format Date And Time?

  • postgres uses ISO-8601

How do dates look like?

  • YYYY-MM-DDTHH:MM:SS
  • 2017-08-17T12:47:16+02:00

What is a format?

  • Is a way of representing a date and time

⬆ back to top

86. Timestamps

  • A timestamp is a date with time and timezone info
SELECT NOW();

CREATE TABLE timezones (
  ts TIMESTAMP WITHOUT TIME ZONE,
  tz TIMESTAMP WITH TIME ZONE
);

INSERT INTO timezones VALUES (
  TIMESTAMP WITHOUT TIME ZONE '2000-01-01 10:00:00',
  TIMESTAMP WITH TIME ZONE '2000-01-01 10:00:00-05'
);

timestamps or date ?

  • depends on what to store
  • normally date is enough

⬆ back to top

87. Date Functions

-- Current Date
SELECT NOW()::date;
SELECT CURRENT_DATE;

-- format
SELECT TO_CHAR(CURRENT_DATE, 'dd/mm/yyyy');
SELECT TO_CHAR(CURRENT_DATE, 'DDD');
SELECT TO_CHAR(CURRENT_DATE, 'WW');

⬆ back to top

88. Date Difference And Casting

  • Subtract dates returns the difference in days
SELECT NOW() - '1800/01/01';
SELECT date '1800/01/01';

⬆ back to top

89. Age Calculation

  • cast to date before calculate age
SELECT AGE(date '1800/01/01');
SELECT AGE(date '1992/11/13', date '1800/01/01');
SELECT EXTRACT (DAY FROM date '1992/11/13') DAY;

⬆ back to top

90. Extracting Information

SELECT EXTRACT (DAY FROM date '1992/11/13') AS DAY;
SELECT EXTRACT (MONTH FROM date '1992/11/13') AS MONTH;
SELECT EXTRACT (YEAR FROM date '1992/11/13') AS YEAR;
SELECT DATE_TRUNC ('year', date '1992/11/13');
SELECT DATE_TRUNC ('month', date '1992/11/13');
SELECT DATE_TRUNC ('day', date '1992/11/13');

⬆ back to top

91. Intervals

-- 30 days before given date
SELECT NOW() - interval '30 days';
SELECT *
FROM orders
WHERE purchaseDate <= now() - interval '30 days'
SELECT NOW() - INTERVAL '1 year 2 months 3 days';
SELECT NOW() - INTERVAL '2 weeks ago';
SELECT NOW() - INTERVAL '1 year 3 hours 20 minutes';

⬆ back to top

92. Exercise: Date and Timestamp

/*
* DB: Employees
* Table: employees
* Question: Get me all the employees above 60, use the appropriate date functions
*/

SELECT AGE(birth_date), * FROM employees
WHERE (
   EXTRACT (YEAR FROM AGE(birth_date))
) > 60 ;

-- alternative
SELECT count(birth_date) FROM employees
WHERE birth_date < now() - interval '61 years' -- 61 years before the current date
            
/*
* DB: Employees
* Table: employees
* Question: How many employees where hired in February?
*/

SELECT count(emp_no) FROM employees
where EXTRACT (MONTH FROM hire_date) = 2;

/*
* DB: Employees
* Table: employees
* Question: How many employees were born in november?
*/

SELECT COUNT(emp_no) FROM employees
WHERE EXTRACT (MONTH FROM birth_date) = 11;

/*
* DB: Employees
* Table: employees
* Question: Who is the oldest employee?
*/

SELECT MAX(AGE(birth_date)) FROM employees;

/*
* DB: Store
* Table: orders
* Question: How many orders were made in January 2004?
*/

SELECT COUNT(orderid)
FROM orders
WHERE DATE_TRUNC('month', orderdate) = date '2004-01-01';

⬆ back to top

93. DISTINCT

  • remove duplicate
  • keep one row from each group of duplicates
  • multiple columns will evaulate based on the combination of columns

⬆ back to top

94. Exercise: Distinct Keyword

/*
* DB: Employees
* Table: titles
* Question: What unique titles do we have?
*/

SELECT DISTINCT title FROM titles;

/*
* DB: Employees
* Table: employees
* Question: How many unique birth dates are there?
*/

SELECT COUNT(DISTINCT birth_date)
from employees;

/*
* DB: World
* Table: country
* Question: Can I get a list of distinct life expectancy ages
* Make sure there are no nulls
*/

SELECT DISTINCT lifeexpectancy FROM country
WHERE lifeexpectancy IS NOT NULL
ORDER BY lifeexpectancy;

⬆ back to top

95. Sorting Data

  • sort data ascending or descending by column
  • using expressions
SELECT * FROM customers
ORDER BY LENGTH(name)

⬆ back to top

96. Multi Table SELECT

  • A join combines columns from one table to another
SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a, salaries AS b
WHERE a.emp_no = b.emp_no
ORDER BY a.emp_no;

⬆ back to top

97. Inner Join

  • Table A [Matched] Table B
  • Inner join syntax is generally considered a best pratice. It's more readable than using the WHERE syntax. It shows you what's being joined
  • can become more complicated as you add more tables to combine
SELECT a.emp_no, b.salary, b.from_date
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
ORDER BY a.emp_no;
-- You want to know the original salary and also salary at a promotion
SELECT a.emp_no, 
  CONCAT(a.first_name, a.last_name) AS "name",
  b.salary, 
  COALESCE(c.title, 'No title change'),
  COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
INNER JOIN titles AS c
ON c.emp_no = a.emp_no AND (
  c.from_date = (b.from_date + interval '2 days') OR
  c.from_date = b.from_date
)
ORDER BY a.emp_no;

⬆ back to top

98. Self Join

  • join a table to itself
  • a table has a foreign reference its primary key
  • similar to inner join
id name startDate supervisorId
1 Binni 1990/01/13 2
2 Andrei 1980/01/23 2
CREATE TABLE employee( 
  id varchar(5) NOT NULL,
  name varchar(20) NULL,
  start_date date NULL,
  supervisorId varchar(5) NULL,
  CONSTRAINT id PRIMARY KEY(id),
  CONSTRAINT supervisorId FOREIGN KEY(supervisorId) REFERENCES employee(id)
);

INSERT INTO employee VALUES ('1', 'Boss Man', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1a', 'Andrei', date '1980/01/23', '1');
INSERT INTO employee VALUES ('1abc', 'Boss Man', date '1980/01/23', '1a');
SELECT a.id, 
  a.name AS "employee", 
  b.name AS "supervisor"
FROM employee AS a, 
  employee AS b
WHERE a.supervisorId = b.id;
SELECT a.id, 
  a.name AS "employee", 
  b.name AS "supervisor"
FROM employee AS a
INNER JOIN employee AS b
ON a.supervisorId = b.id;

⬆ back to top

99. Outer Join

  • What if I need the rows that don't match?
  • Table A <-> Table B
  • Left outer add data that don't have a match from table A
  • Right outer add data that don't have a match from table B
  • Any value that does not match is made to be NULL
SELECT *
FROM <table A> AS a
LEFT [OUTER] JOIN <table B> AS b
ON a.id = b.id;
-- You want to know every salary and also salary at a promotion
SELECT a.emp_no, 
  CONCAT(a.first_name, a.last_name) AS "name",
  b.salary, 
  COALESCE(c.title, 'No title change'),
  COALESCE(c.from_date::text, '-') AS "title taken on"
FROM employees AS a
INNER JOIN salaries AS b
ON a.emp_no = b.emp_no
LEFT JOIN titles AS c
ON c.emp_no = a.emp_no AND (
  c.from_date = (b.from_date + interval '2 days') OR
  c.from_date = b.from_date
)
ORDER BY a.emp_no;

⬆ back to top

100. Less Common Joins

-- cross join: create a combination of every row
SELECT * FROM <tableA>
CROSS JOIN <tableB>;

-- full outer join: returns result from both table whether they match or not
SELECT * FROM <tableA> AS a
FULL JOIN <tableB> AS b
ON a.id = b.id;

⬆ back to top

101. Inner-Join Exercises

/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/

SELECT c.firstname, c.lastname, o.orderid FROM orders AS o
INNER JOIN customers AS c ON o.customerid = c.customerid
WHERE c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;

/*
* DB: Store
* Table: products
* Question: Show me the inventory for each product
*/

SELECT p.prod_id, i.quan_in_stock
FROM products as p
INNER JOIN inventory AS i oN p.prod_id = i.prod_id 


/*
* DB: Employees
* Table: employees
* Question: Show me for each employee which department they work in
*/

SELECT e.first_name, dp.dept_name
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS dp ON dp.dept_no = de.dept_no

⬆ back to top

102. USING Keyword

  • simplify the join syntax
SELECT * 
FROM <tableA>
INNER JOIN <tableB> USING(id)
SELECT e.emp_no, e.first_name, d.dept_name 
FROM employees AS e
INNER JOIN dept_emp AS de ON de.emp_no = e.emp_no
INNER JOIN departments AS d USING(dept_no);

⬆ back to top

Section 6: Advanced SQL

103. GROUP BY

  • summarise or aggregate data by groups
  • why group data? to get in-depth information by group
  • GROUP BY splits data into groups to apply functions to group rather than entire table
  • When we GROUP BY, we apply the function by group
  • reduce all records found in the group to a single record
  • GROUP BY utilizes SPLIT-APPLY-COMBINE strategy
  • GROUP BY happens after WHERE / FROM
  • FROM -> WHERE -> GROUP BY -> SELECT -> ORDER

⬆ back to top

104. HAVING Keyword

  • what if I want to filter groups?
  • HAVING apply filters to a group as a whole
  • FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER
SELECT col1, COUNT(col2)
FROM <table>
WHERE col2 > X
GROUP BY col1
HAVING col1 === Y;
SELECT d.dept_name, 
  COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
WHERE e.gender = 'M'
GROUP BY d.dept_name
HAVING COUNT(e.emp_no) > 25000;

⬆ back to top

105. Ordering Grouped Data

SELECT d.dept_name, 
  COUNT(e.emp_no) AS "# of employees"
FROM employees AS e
INNER JOIN dept_emp AS de
ON de.emp_no = e.emp_no
INNER JOIN departments AS d
ON de.dept_no = d.dept_no
GROUP BY d.dept_name
ORDER BY COUNT(e.emp_no) DESC;

⬆ back to top

106. Group By Mental Model

  • GROUP BY emp_no
  • we can apply aggregate function to from_date and salary
  • however, we cannot get the salary of MAX(from_date)
  • or cannot get the from_date of MAX(salary)
SELECT emp_no, MAX(from_date), MAX(salary)
FROM salaries
GROUP BY emp_no;

⬆ back to top

107. Grouping Sets

  • What if we want to combine the results of multiple groupings?
  • UNION removes duplicate records
  • UNION ALL does not remove duplicate records
  • Grouping Sets is a subclause of GROUP BY that allows you to define multiple groupings
SELECT NULL AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol

UNION

SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY prod_id
ORDER BY prod_id DESC;
SELECT prod_id AS "prod_id", SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY 
  GROUPING SETS (
    (),
    (prod_id)
  )
ORDER BY prod_id DESC;
SELECT prod_id AS "prod_id", orderlineid, SUM(ol.quantity)
FROM "public"."orderlines" AS ol
GROUP BY 
  GROUPING SETS (
    (),
    (prod_id),
    (orderlineid)
  )
ORDER BY prod_id DESC, orderlineid DESC;

⬆ back to top

108. Rollup

  • useful to replace GROUPING SETS that need all combinations
SELECT EXTRACT (YEAR FROM orderdate) AS "year",
  EXTRACT (MONTH FROM orderdate) AS "month",
  EXTRACT (DAY FROM orderdate) AS "day",
  sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
  GROUPING SETS (
    (EXTRACT (YEAR FROM orderdate)),
    (
      EXTRACT (YEAR FROM orderdate),
      EXTRACT (MONTH FROM orderdate)
    ),
    (
      EXTRACT (YEAR FROM orderdate),
      EXTRACT (MONTH FROM orderdate),
      EXTRACT (DAY FROM orderdate)
    ),
    (
      EXTRACT (MONTH FROM orderdate),
      EXTRACT (DAY FROM orderdate)
    ),
    (EXTRACT (MONTH FROM orderdate)),
    (EXTRACT (DAY FROM orderdate)),
    ()
  )
ORDER BY
  EXTRACT (YEAR FROM orderdate),
  EXTRACT (MONTH FROM orderdate),
  EXTRACT (DAY FROM orderdate);
SELECT EXTRACT (YEAR FROM orderdate) AS "year",
  EXTRACT (MONTH FROM orderdate) AS "month",
  EXTRACT (DAY FROM orderdate) AS "day",
  sum(ol.quantity)
FROM orderlines AS ol
GROUP BY
  ROLLUP (
    EXTRACT (YEAR FROM orderdate),
    EXTRACT (MONTH FROM orderdate),
    EXTRACT (DAY FROM orderdate)
  )
ORDER BY
  EXTRACT (YEAR FROM orderdate),
  EXTRACT (MONTH FROM orderdate),
  EXTRACT (DAY FROM orderdate);

⬆ back to top

109. Group By Exercises

/*
*  How many people were hired on did we hire on any given hire date?
*  Database: Employees
*  Table: Employees
*/
SELECT a.hire_date, COUNT(b.hire_date) as "amount"
FROM employees as a, employees as b
WHERE a.hire_date = b.hire_date
GROUP BY a.hire_date
ORDER BY "amount" DESC;

/*
*  Show me all the employees, hired after 1991, that have had more than 2 titles
*  Database: Employees
*/
SELECT e.emp_no, count(t.title) as "amount of titles"
FROM employees as e
JOIN titles as t USING(emp_no)
WHERE EXTRACT (YEAR FROM e.hire_date) > 1991
GROUP BY e.emp_no
HAVING count(t.title) > 2
ORDER BY e.emp_no;

/*
*  Show me all the employees that have had more than 15 salary changes that work in the department development
*  Database: Employees
*/
SELECT e.emp_no, count(s.from_date) as "amount of raises"
FROM employees as e
JOIN salaries as s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
WHERE de.dept_no = 'd005'
GROUP BY e.emp_no
HAVING count(s.from_date) > 15
ORDER BY e.emp_no;

/*
*  Show me all the employees that have worked for multiple departments
*  Database: Employees
*/
SELECT e.emp_no, count(de.dept_no) as "worked for # departments"
FROM employees as e
JOIN dept_emp AS de USING(emp_no)
GROUP BY e.emp_no
HAVING count(de.dept_no) > 1
ORDER BY e.emp_no;

⬆ back to top

110. Window What?

  • group data is useful
  • group happens after FROM / WHERE
  • HAVING is a special filter for groups
  • GROUPING SETS AND ROLLUP are useful for multiple groupings in a single query
  • Group data is not a silver bullet

Window functions

  • How do we apply functions against a set of rows related to the current row?
  • add average to every salary to see how much each employee is from average

⬆ back to top

111. Looking Through The Window

  • Window Functions
  • window functions create a new column based on functions performed on a subset or "window" of the data
window_function(arg1, arg2,..) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY sort_expression [ASC | DESC] [NULLS {FIRST | LAST}]
)
-- how to display max salary in each indivodual row?
SELECT *, MAX(salary) OVER() 
FROM salaries
LIMIT 100;

⬆ back to top

112. PARTITION BY

  • divide rows into groups to apply the function against
SELECT *, AVG(salary) 
OVER (
  PARTITION BY d.dept_name
) 
FROM salaries
JOIN dept_emp AS de USING (emp_no)
JOIN departments AS d USING (dept_no);

⬆ back to top

113. Order By Acting Strange

  • ORDER BY: order the results

window function ORDER BY

  • ORDER BY changes the frame of the window function
  • accumulating: take account of everything before me and myself
SELECT emp_no, salary, COUNT(salary) 
OVER (ORDER BY emp_no) 
FROM salaries

⬆ back to top

114. Using Framing In Window Function

  • when use a frame, we can create a sub-range or frame
  • without ORDER BY, by default the frame is usually all partition rows
  • with ORDER BY, by default the frame is usually everything before current row and current row

Window Function Calls

Key: Meaning

  • Rows or range: use a range or rows as a frame
  • preceding: rows before the current one
  • following: rows after the current one
  • unbounded preceding or following: returns all before or after
  • current row: your current row
PARTITION BY category 
ORDER BY price 
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT emp_no, salary, 
COUNT(salary) OVER (ORDER BY emp_no) 
FROM salaries
SELECT emp_no, salary, 
COUNT(salary) OVER (
  PARTITION BY emp_no
  ORDER BY salary
) 
FROM salaries
SELECT emp_no, salary, 
COUNT(salary) OVER (
  PARTITION BY emp_no
  ORDER BY salary
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) 
FROM salaries

⬆ back to top

115. Solving For Current Salary

Window Functions

SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no, 
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no, 
  LAST_VALUE(s.from_date) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ),
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no, 
  s.from_date,
  s.salary
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;

⬆ back to top

116. FIRST_VALUE

  • return a value evaluated against the first row within its partition

SELECT prod_id, price, category
  FIRST_VALUE(price) OVER (
    PARTITION BY category
    ORDER BY price
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS "cheapest in category"
FROM products
SELECT prod_id, price, category,
  MIN(price) OVER (
    PARTITION BY category
  ) AS "cheapest in category"
FROM products

⬆ back to top

117. LAST_VALUE

  • return a value evaluated against the last row within its partition
SELECT prod_id, price, category,
  LAST_VALUE(price) OVER (
    PARTITION BY category
    ORDER BY price
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS "most expensive in category"
FROM products
SELECT prod_id, price, category,
  MAX(price) OVER (
    PARTITION BY category
  ) AS "most expensive in category"
FROM products

⬆ back to top

118. SUM

  • sum values within a group depend on the frame

SELECT o.orderid, o.customerid, o.netamount,
  SUM(o.netamount) OVER (
    PARTITION BY o.customerid
    ORDER BY o.orderid
  ) AS "cum sum"
FROM orders as o
ORDER BY o.customerid

⬆ back to top

119. ROW_NUMBER

  • number the current row within the partition starting from 1 regardless of framing
SELECT prod_id, price, category,
  ROW_NUMBER() OVER(
    PARTITION BY category
    ORDER BY price
  ) AS "position in category by price"
FROM products

⬆ back to top

120. Window Function Exercises

/*
*  Show the population per continent
*  Database: World
*  Table: Country
*/

SELECT
  DISTINCT continent,
  SUM(population) OVER w1 as "continent population"
FROM country 
WINDOW w1 AS( PARTITION BY continent );

/*
*  To the previous query add on the ability to calculate the percentage of the world population
*  What that means is that you will divide the population of that continent by the total population and multiply by 100 to get a percentage.
*  Make sure you convert the population numbers to float using `population::float` otherwise you may see zero pop up
*
*  Database: World
*  Table: Country
*/

SELECT
  DISTINCT continent,
  SUM(population) OVER w1 as"continent population",
  CONCAT( 
      ROUND( 
          ( 
            SUM( population::float4 ) OVER w1 / 
            SUM( population::float4 ) OVER() 
          ) * 100    
      ),'%' ) as "percentage of population"
FROM country 
WINDOW w1 AS( PARTITION BY continent );


/*
*  Count the number of towns per region
*
*  Database: France
*  Table: Regions (Join + Window function)
*/

SELECT 
DISTINCT r.id, 
r."name", 
COUNT(t.id) OVER (
    PARTITION BY r.id
    ORDER BY r."name"
) AS "# of towns"
FROM regions AS r
JOIN departments AS d ON r.code = d.region 
JOIN towns AS t ON d.code = t.department
ORDER BY r.id;

⬆ back to top

121. Conditional Statements

  • What if you only want to select something when a certain criteria is met?
  • case statements can be used in multiple places in a query
  • each return must be a single output

Case statement

SELECT a,
  CASE 
    WHEN a=1 THEN 'one'
    WHEN a=2 THEN 'two'
  END
FROM test;

Render custom row data

SELECT 
  o.orderid,
  o.customerid,
  CASE 
    WHEN o.customerid = 1 
    THEN 'my first customer'
    ELSE 'not my first customer'
  END,
  o.netamount
FROM orders as o
ORDER BY o.customerid;

Filter in a where

SELECT 
  o.orderid,
  o.customerid,
  o.netamount
FROM orders as o
WHERE CASE 
  WHEN o.customerid > 10 
  THEN o.netamount < 100 
  ELSE o.netamount > 100 
END
ORDER BY o.customerid;

In an aggregate function

SELECT SUM(
  CASE 
    WHEN o.netamount <100
    THEN -100 
    ELSE o.netamount
  END
) AS "returns",
  SUM(o.netamount) AS "normal total"
FROM orders as o;

⬆ back to top

122. NULLIF

  • what if you want to return NULL if a condition is met?
  • if value 1 = value 2, return NULL
  • fill in empty spots with a NULL to avoid divide by zero issues
NULLIF(val_1, val_2)
NULLIF(0, 0)  -- NULL
NULLIF('ABC', 'DEF')  -- 'ABC'

⬆ back to top

123. Views...What Are They Good For?

  • What if you want to store the results of a query?
  • What if you want to query the results of a query?
  • view allow you to store and query previously run queries
  • there are 2 types of views materialized and non-materialized
  • non-materialized: query get re-run each time the view is called
  • materialized: store the data physically and periodically updates it when tables change

⬆ back to top

124. View Syntax

  • view is the output of the query we ran
  • view act like tables you can query them
  • view take very little space to store
  • we only store the definition of a view
  • not all of the data that it returns
CREATE VIEW view_name AS query;
CREATE OR REPLACE view_name AS query;
ALTER VIEW <view_name> RENAME TO <view_name>;
DROP VIEW [ IF EXISTS] <view_name>;

⬆ back to top

125. Using Views

  • get the most recent salary of an employee
  • use window functions
  • view is easier to reason then window functions
SELECT e.emp_no, e.first_name, d.dept_name, MAX(s.salary)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no, e.first_name, d.dept_name
ORDER BY e.emp_no;
SELECT DISTINCT e.emp_no, 
  LAST_VALUE(salary) OVER (
    PARTITION BY s.emp_no
    ORDER BY s.from_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
CREATE OR REPLACE VIEW last_salary_change AS
SELECT e.emp_no, 
  MAX(s.from_date)
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
GROUP BY e.emp_no
ORDER BY e.emp_no;
SELECT * FROM salaries
JOIN last_salary_change AS l USING(emp_no)
WHERE from_date = l.max
ORDER BY emp_no;
SELECT s.emp_no, 
  d.dept_name, 
  s.from_date, 
  s.salary
FROM last_salary_change
JOIN salaries AS s USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
WHERE max = s.from_date;

⬆ back to top

126. Indexes

Poor database indexing – a SQL query performance killer – recommendations

What is Indexes?

  • An index is a pointer to data in a table
  • An index is a construct to speed up data access in a table
  • An index is used to find data without having to scan the whole table
  • An index is used to improve query performance with filter conditions
  • An index work like a table of contents to help you find a piece of data

What does it do?

  • speed up queries
  • slow down data insertion and updates

Types of indexes

  • Single column
  • Multi column
  • Unique (PK)
  • Partial
  • Implicit Indexes (be default)
CREATE UNIQUE INDEX <name>
on <table> (column1, column2, ...)

DROP INDEX <name>

When to use

  • Index foreign keys
  • Index primary keys and unique columns
  • Index on columns that end up in the ORDER BY / WHERE clause often

When NOT to use

  • Don't add an index just to add an index
  • Don't use indexes on small tables
  • Don't use on tables that are updated frequently
  • Don't use on columns that can contain NULL values
  • Don't use on columns that have large values

⬆ back to top

127. Index Types

Index Types

  • Single column
    • Most frequently used column in a query
    • when? retrieve data that satisfies one condition
  • Multi column
    • Most frequently used columns in a query
    • when? retrieve data that satisfies multiple conditions
  • Unique
    • normally primary key or column with unique value
    • for speed and integrity
CREATE UNIQUE INDEX <name>
ON <table> (column1); 
  • Partial
    • index over a subset of a table
CREATE INDEX <name>
ON <table> (<expression>); 
  • Implicit Indexes (be default)
    • auto created by database
    • PRIMARY KEY
    • UNIQUE KEY
EXPLAIN ANALYZE
SELECT "name", "district", "countrycode"
FROM "public"."city"
WHERE countrycode IN ('TUN', 'BE', 'NL');
CREATE INDEX idx_countrycode
ON city (countrycode);

Without Index

With Index

CREATE INDEX idx_countrycode
ON city (countrycode)
WHERE countrycode IN ('TUN', 'BE', 'NL');

With Partial Index

⬆ back to top

128. Index Algorithms

Postgres provides several types of indexes

  • B-Tree: default algorithm
    • best used for comparisons
    • <, <=, =, >, >=, BETWEEN, IN, IS NULL, IS NOT NULL
  • Hash
    • can only handle equality (=) operations
  • Gin: generalized inverted index
    • best used when multiple values are stored in a single field
  • Gist: generalized search tree
    • useful in index geometric data and full-text search

Each index type uses a different algorithm

CREATE [UNIQUE] INDEX <name>
ON <table> USING <method> (column1, ...)

⬆ back to top

129. What Are Subqueries?

  • a construct that allow you to build extremely complex query
  • also called inner query or inner select
  • is a query within another query
  • most often found in the WHERE clause
  • can also used it in SELECT, FROM and HAVING clause
  • return a single value the SELECT clause,
  • return a single column with single or multiple rows the WHERE and HAVING clause
  • return a table in the FROM or JOIN clause
SELECT *
FROM <table>
WHERE <column> <condition> (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)
SELECT (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)
FROM <table> AS <name>
SELECT *
FROM (
  SELECT <column>, <column>, ...
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
) AS <name>
SELECT *
FROM <table> AS <name>
GROUP BY <column>
HAVING (
  SELECT <column> 
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
) 

⬆ back to top

130. Subqueries vs Joins

  • how a subquery different from a join?
  • both combine data from different tables
  • subquery could stand alone but not join
  • join combines rows from one or more tables based on match condition
  • subquery can return a single result or a row set(table)
  • joins can only return a row set(table)
  • a subquery's results cannot be reference by outer query
  • a joined table can be reference by outer query
  • use a join if possible over subquery for performance reason
SELECT AVG(price) FROM products

SELECT title, price, 
  (SELECT AVG(price) FROM products) AS "global average price"
FROM products;
SELECT prod_id, title, price, i.quan_in_stock 
FROM products
JOIN inventory AS i USING(prod_id)
SELECT title, price, 
  (SELECT AVG(price) FROM products) AS "global average price"
FROM (
  SELECT * FROM products WHERE price < 10
) AS "products_sub";

⬆ back to top

131. Subquery Guidelines As Types

  • a subquery must be enclosed in parentheses
  • must be placed on the right side of the comparision operator
  • cannot manipulate results internally (order by ignored)
  • use single row operators with single-row subquery
  • subquery return NULL may not return results
SELECT *
FROM <table> AS <name>
WHERE X >= | <= | = | != (
  SELECT MAX(<column>)
  FROM <table>
  [WHERE | GROUP BY | ORDER BY | ...] 
)

Types of subquery

  • single row: return 0 or 1 row
  • multiple row: return 0 or more rows
  • multiple column: return 0 or more columns
  • correlated: reference 1 or more columns in the other statement - runs against each row
  • correlated: can have performance bottleneck
  • nested: subquery in a subquery

single row

SELECT name, salary
FROM salaries
WHERE salary = 
  (SELECT AVG(salary) FROM salaries);
SELECT name, salary,
  (SELECT AVG(salary) FROM salaries) 
  AS "Company average salary"
FROM salaries;

multiple row

SELECT title, price, category
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);

multiple column

SELECT emp_no, salary, 
  dea.avg AS "Department average salary"
FROM salaries AS s
JOIN dept_emp AS de USING(emp_no)
JOIN (
  SELECT dept_no, AVG(salary) 
  FROM salaries AS s2
  JOIN dept_emp AS e USING(emp_no)
  GROUP BY dept_no
) AS dea USING(dept_no)
WHERE salary > dea.avg;

correlated

SELECT emp_no, salary, from_date
FROM salaries AS s
WHERE from_date = (
  SELECT MAX(s2.from_date) AS max
  FROM salaries AS s2
  WHERE s2.emp_no = s.emp_no
)
ORDER BY emp_no;

nested

SELECT orderlineid, prod_id, quantity
FROM orderlines
JOIN (
  SELECT prod_id
  FROM products
  WHERE category IN (
    SELECT category FROM categories
    WHERE categoryname IN ('Comedy', 'Family', 'Classics')
  )
) AS limited USING(prod_id)

⬆ back to top

132. Using Subqueries

show all employees older than the average age

SELECT 
  first_name, 
  last_name, 
  birth_date, 
  AGE(birth_date),
  (
    SELECT AVG(AGE(birth_date)) 
    FROM employees
  )
FROM employees
WHERE AGE(birth_date) > (
  SELECT AVG(AGE(birth_date)) 
  FROM employees
);

Show the title by salary for each employee

SELECT 
  emp_no, 
  salary, 
  from_date, 
  (
    SELECT title 
    FROM titles AS t 
    WHERE t.emp_no = s.emp_no 
      AND t.from_date = s.from_date
  )
FROM salaries AS s
ORDER BY emp_no;

use join instead of subquery if possible for performance reason

SELECT 
  emp_no, 
  salary, 
  from_date,
  t.title
FROM salaries AS s
JOIN titles AS t USING(emp_no, from_date)
ORDER BY emp_no;

⬆ back to top

133. Getting The Latest Salaries

Show the most recent employee salary

SELECT 
  emp_no, 
  salary AS "most recent salary", 
  from_date
FROM salaries AS s
WHERE from_date = (
  SELECT MAX(from_date)
  FROM salaries AS sp
  WHERE sp.emp_no = s.emp_no
)
ORDER BY emp_no ASC;
SELECT   
  emp_no, 
  salary AS "most recent salary", 
  from_date 
FROM salaries AS s
JOIN last_salary_change AS ls USING(emp_no)
WHERE from_date = ls.max
ORDER BY emp_no;
SELECT 
  emp_no, 
  salary AS "most recent salary", 
  from_date
FROM salaries AS s
JOIN (
  SELECT emp_no, MAX(from_date)
  FROM salaries AS sp
  GROUP BY emp_no
) AS ls USING(emp_no)
WHERE ls.max = from_date
ORDER BY emp_no ASC;

⬆ back to top

134. Subquery Operators

  • operators you can apply in the WHERE clause on subquery
  • EXISTS: check if subquery returns any rows
  • IN: check if value is equal to any of the rows in the return (NULL yields NULL)
  • NOT IN: check if value is not equal to any of the rows in the return (NULL yields NULL)
  • ANY/SOME: check each row against the operator and if any comparison matches return TRUE
  • ALL: check each row against the operator and if all comparisons match return TRUE
  • SINGLE VALUE COMPARISON: subquery must return a single row check comparatore against row
SELECT firstname, lastname, income
FROM customers AS c
WHERE EXISTS (
  SELECT * FROM orders AS o
  WHERE c.customerid = o.customerid 
    AND totalamount > 400 
) AND income > 90000;
SELECT prod_id
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id
FROM products
WHERE category IN (
  SELECT category FROM categories
  WHERE categoryname NOT IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id
FROM products
WHERE category = ANY (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy', 'Family', 'Classics')
);
SELECT prod_id, title, sales
FROM products
JOIN inventory AS i USING(prod_id)
WHERE i.sales > ALL (
  SELECT AVG(sales) FROM inventory
  JOIN products AS pl USING(prod_id)
  GROUP BY pl.category
);
SELECT prod_id
FROM products
WHERE category = (
  SELECT category FROM categories
  WHERE categoryname IN ('Comedy')
);

⬆ back to top

135. Subquery Exercises

/* TRY TO WRITE THESE AS JOINS FIRST */
/*
* DB: Store
* Table: orders
* Question: Get all orders from customers who live in Ohio (OH), New York (NY) or Oregon (OR) state
* ordered by orderid
*/

SELECT c.firstname, c.lastname, o.orderid 
FROM orders AS o, (
    SELECT customerid, state, firstname, lastname
    FROM customers
) AS c
WHERE  o.customerid = c.customerid AND 
c.state IN ('NY', 'OH', 'OR')
ORDER BY o.orderid;

/*
* DB: Employees
* Table: employees
* Question: Filter employees who have emp_no 110183 as a manager
*/

SELECT emp_no, first_name, last_name
FROM employees
WHERE emp_no IN (
    SELECT emp_no
    FROM dept_emp
    WHERE dept_no = (
        SELECT dept_no 
        FROM dept_manager
        WHERE emp_no = 110183
    )
)
ORDER BY emp_no

-- Written with JOIN
SELECT e.emp_no, first_name, last_name
FROM employees as e
JOIN dept_emp as de USING (emp_no)
JOIN dept_manager as dm USING (dept_no)
WHERE dm.emp_no = 110183

⬆ back to top

Section 7: Database Management

136. What we'll be learning

  • different types of data
  • how to create a database
  • how to create tables
  • how to insert/update data
  • ... so more more

⬆ back to top

137. Time To Create Some Stuff!

Why is it important?

  • create and maintain data is extremely hard

Data definition language (DDL)

  • commands to create, modify, delete different structures
  • commands for add, delete, modify data

⬆ back to top

138. Types Of Databases In A RDBMS

Types Of Databases

  • regular
  • template: build a regular database based on a tempate

Create a database

  • when you setup your postgres, 3 databases were created
    • postgres
    • template0
    • template1
  • postgres is the default database that is created when you setup postgres (initdb)

⬆ back to top

139. Default PostgreSQL Database

Postgres Database

  • So the first time you initialize postgres, you can connect to postgres database to setup your first database
    • psql -U <user> <database>
  • postgres by default will assume a connection to a database with the same name as the user if no database is supplied
    • psql -U postgres
  • This command shows you your current connection
    • \conninfo

⬆ back to top

140. Template Databases

Template0 Database

  • This is the template that is used to create template1 - never change it
  • It is in essence a backup template

Template1 Database

  • This is the template that is used to create new databases

  • Any changes we make to the template are automatically applied to all new databases

  • one thing to note is that becase template1 is the default template - if it is being accessed / connected to no new databases can be created until the connection is closed!

  • Case 1: connect to Template1 and try to create new database

psql -U postgres template1
template1=#
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
ERROR:  source database "template1" is being accessed by other users
DETAIL:  There is 1 other session using the database.
  • Case 2: exit Template1 and try to create new database
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb;
CREATE DATABASE

Case 3: connect to Template1 and try to create new database with template0

psql -U postgres template1
template1=#
psql -U postgres Employees
Employees=# CREATE DATABASE myblockeddb2 WITH TEMPLATE template0;
CREATE DATABASE

Template Databases

  • You can create your own template database to create other database with. But this however not very common!
CREATE DATABASE mysupertemplate;
CREATE TABLE supertable ();
CREATE DATABASE mysuperdatabase with TEMPLATE mysupertemplate;

⬆ back to top

141. Creating A Database

CREATE DATABASE name
  [ [ WITH ] 
    [ OWNER [=] user_name ]
    [ TEMPLATE [=] template ]
    [ ENCODING [=] encoding ]
    [ LC_COLLATE [=] lc_collate ]
    [ LC_CTYPE [=] lc_ctype ]
    [ TABLESPACE [=] tablespace ]
    [ CONNECTION_LIMIT [=] connlimit ] ]
Setting Default
TEMPLATE template1
ENCODING UTF8
CONNECTION_LIMIT 100
OWNER Current User

Lets create an exmple database to store courses!

psql -U postgres postgres
postgres=# CREATE DATABASE ztm;
CREATE DATABASE

postgres=# DROP DATABASE ztm;
DROP DATABASE

⬆ back to top

142. Database Organization

Postgres schema

  • Postgres offers the concept of schemas. Think of it like a box in which you can organise tables, views, indexes, etc
  • by default, each database gets a public schema
  • unless you specific a schema, the default us always assumed to be public!
SELECT * FROM employees
SELECT * FROM public.employees
postgres=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 public | postgres
(1 row)

Create a schema

postgres=# CREATE SCHEMA sales;
CREATE SCHEMA
postgres=# \dn
   List of schemas
  Name  |    Owner    
--------+-------------
 public | postgres
 sales  | postgres
(2 rows)

Reasons to use schemas

  • allow many users to use one database without interfering with each other
  • organise database objects into logical groups to make them more management
  • 3rd party applications can be put into seperate schemas so they do not collide with the names of other objects

Restricted

  • create databases is a restricted action
  • not everyone is allowed to do it

⬆ back to top

143. Roles In Postgres

Roles

  • roles are vital to any dbms they determine what's allowed
  • a role can be a user or a group
  • it depends on how you setup the role
  • roles have the ability to grant membership to another role
  • a role have attributes which define privileges

⬆ back to top

144. Role Attributes And Creation

Role Attribute

  • privileges of a role are determined in part by its attributes
  • attribute example
    • SUPERUSER | NOSUPERUSER
    • CREATEDB | NOCREATEDB
    • CREATEROLE | NOCREATEROLE
    • LOGIN | NOLOGIN

Create role

  • always encrypt when storing a role that can log in
CREATE ROLE readonly WITH LOGIN ENCRYPTED PASSWORD 'redaonly';
  • by default only the creator of the database or superuser has access to its onject
SELECT * FROM departments;
ERROR: permission denied for table departments
postgres=# \du
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 chesterheng | Superuser, Create role, Create DB                          | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

⬆ back to top

145. Creating Users And Configuring Login

  • A user is the same a role
  • It assumes login by default
CREATE USER user1 WITH ENCRYPTED PASSWORD 'user1';
postgres=# CREATE role test_role_with_login WITH LOGIN ENCRYPTED PASSWORD 'password';
CREATE ROLE

postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_role_with_login |    

postgres=# CREATE USER test_user_with_login WITH ENCRYPTED PASSWORD 'password';
CREATE ROLE

postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |   
createuser --interactive
Enter name of role to add: test_interactive
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) y
postgres=# \du
                                         List of roles
      Role name       |                         Attributes                         | Member of 
----------------------+------------------------------------------------------------+-----------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test_interactive     | Create role                                                | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |   

postgres=# ALTER ROLE test_interactive WITH ENCRYPTED PASSWORD 'password';
ALTER ROLE
  • Did not prompt for password login to Employees database with test_interactive user
psql -U test_interactive Employees
Employees=> 
  • Force to prompt for password login to Employees database with test_interactive user
  • Can get in with any password
  • Configuration of postgres with trust all default connection by default
psql -U test_interactive Employees -W
Employees=> 
postgres=# show hba_file;
                                  hba_file                                  
----------------------------------------------------------------------------
 /Users/chesterheng/Library/Application Support/Postgres/var-12/pg_hba.conf
(1 row)

postgres=# show config_file;
                                  config_file                                   
--------------------------------------------------------------------------------
 /Users/chesterheng/Library/Application Support/Postgres/var-12/postgresql.conf
(1 row)
  • Set method to scram-sha-256 in hba_file
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256
  • Login to Employees database with test_interactive user
  • will ask for password now
psql -U test_interactive Employees
Password for user test_interactive: 
psql: error: could not connect to server: FATAL:  password authentication failed for user "test_interactive"
  • need to tell postgres to use scram-sha-256 in config_file
password_encryption = scram-sha-256		# md5 or scram-sha-256
  • Set method back to trust in hba_file
  • Alter test_interactive to use password
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

postgres=# ALTER USER test_interactive WITH PASSWORD 'password';
ALTER ROLE
  • Set method to scram-sha-256 in hba_file
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256
psql -U test_interactive Employees
Password for user test_interactive: 

Employees=> 
  • set method trust for postgres user
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             postgres                                trust
psql -U postgres postgres
postgres=# exit

⬆ back to top

146. Privileges

  • attributes can only define so many privileges
  • when you are not a superuser
  • what you can do is fairly limited
  • depending on what default provoleges are set
  • by default objects are only available to the one who creates them
  • privileges need to be granted for new roles and users to have access to certain data

Grant Privileges

  • be wise about the privileges you grant
GRANT ALL PRIVILEGES ON <table> TO <user>;
GRANT ALL ON ALL TABLES [IN SCHEMA <schema>] TO <user>;
GRANT [SELECT, INSERT, UPDATE, DELETE, ...] ON <table> [IN SCHEMA <schema>] TO <user>;

⬆ back to top

147. Granting Privileges and Role Management

  • Set method back to trust in hba_file for convenient
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
  • create new user privilegetest
  • privilegetest has no privilege to query titles table created by postgres
createuser --interactive
Enter name of role to add: privilegetest
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n

psql -U privilegetest Employees

Employees=# \conninfo
You are connected to database "Employees" as user "privilegetest" via socket in "/tmp" at port "5432".

Employees=# \dt
              List of relations
 Schema |     Name     | Type  |    Owner    
--------+--------------+-------+-------------
 public | departments  | table | postgres
 public | dept_emp     | table | postgres
 public | dept_manager | table | postgres
 public | employees    | table | postgres
 public | salaries     | table | postgres
 public | titles       | table | postgres
(6 rows)

Employees-# SELECT * FROM titles;
ERROR:  syntax error at or near "SELECT"
LINE 2: SELECT * FROM titles;
        ^
  • use postgres user to grant privilegetest the privilege to query titles table created by postgres
psql -U postgres Employees
Employees=# GRANT SELECT ON titles TO privilegetest;
GRANT
  • use privilegetest to query titles table created by postgres
psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
 emp_no |       title        | from_date  |  to_date   
--------+--------------------+------------+------------
  10001 | Senior Engineer    | 1986-06-26 | 9999-01-01
  10002 | Staff              | 1996-08-03 | 9999-01-01
  10003 | Senior Engineer    | 1995-12-03 | 9999-01-01
  10004 | Engineer           | 1986-12-01 | 1995-12-01
  10004 | Senior Engineer    | 1995-12-01 | 9999-01-01
  10005 | Senior Staff       | 1996-09-12 | 9999-01-01
  10005 | Staff              | 1989-09-12 | 1996-09-12
  10006 | Senior Engineer    | 1990-08-05 | 9999-01-01
  10007 | Senior Staff       | 1996-02-11 | 9999-01-01
  10007 | Staff              | 1989-02-10 | 1996-02-11
  10008 | Assistant Engineer | 1998-03-11 | 2000-07-31
  10009 | Assistant Engineer | 1985-02-18 | 1990-02-18
  10009 | Engineer           | 1990-02-18 | 1995-02-18
  10009 | Senior Engineer    | 1995-02-18 | 9999-01-01
  10010 | Engineer           | 1996-11-24 | 9999-01-01
  10011 | Staff              | 1990-01-22 | 1996-11-09
  10012 | Engineer           | 1992-12-18 | 2000-12-18
  10012 | Senior Engineer    | 2000-12-18 | 9999-01-01
  10013 | Senior Staff       | 1985-10-20 | 9999-01-01
  10014 | Engineer           | 1993-12-29 | 9999-01-01
  10015 | Senior Staff       | 1992-09-19 | 1993-08-22
:
  • use postgres user to revoke privilegetest the privilege to query titles table created by postgres
psql -U postgres Employees
Employees=# REVOKE SELECT ON titles FROM privilegetest;
REVOKE
psql -U privilegetest Employees
Employees=> SELECT * FROM titles;
ERROR:  permission denied for table titles
  • grant all privileges to privilegetest in public schema
psql -U postgres Employees
Employees=# GRANT ALL ON ALL TABLES IN SCHEMA public TO privilegetest;
GRANT
Employees=# REVOKE ALL ON ALL TABLES IN SCHEMA public FROM privilegetest;
REVOKE
  • create role with granular option such as employee_read
  • grant role with granular option to specific user
Employees=# CREATE ROLE employee_read;
CREATE ROLE

Employees=# GRANT SELECT ON ALL TABLES IN SCHEMA public TO employee_read;
GRANT

Employees=# GRANT employee_read TO privilegetest;
GRANT ROLE

Employees=# \du
                                            List of roles
      Role name       |                         Attributes                         |    Member of    
----------------------+------------------------------------------------------------+-----------------
 chesterheng          | Superuser, Create role, Create DB                          | {}
 employee_read        | Cannot login                                               | {}
 postgres             | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 privilegetest        |                                                            | {employee_read}
 test_interactive     | Create role                                                | {}
 test_role_with_login |                                                            | {}
 test_user_with_login |                                                            | {}

Employees=# REVOKE employee_read FROM privilegetest;
REVOKE ROLE

⬆ back to top

148. Best Practices For Role Management

  • when manage roles and permissions, always go with the principle of least privilege
  • assign attributes and grant privileges to a role
  • assign a role to a specific user
  • don't use admin / super user by default

⬆ back to top

149. Data Types & Boolean Type

Before we start

  • before we create a full-fledged database, it's important to know what types of data we can store in a database

Data types

  • when we talk about types of data, we can store we categorize them into buckets or types
  • type: Numberic Types, Arrays, Character Types, Date/Time Types, Boolean Types, UUID Types
  • Data Types
  • A data type is nothing more than a constraint place on a field to only allow that type of data to be filled in
  • Specific a type is important because it tells the system how it can / should handle the data in that field
  • it's also what allows the dbms to optimize it's algorithms for processing data by knowing its type

Boolean

  • a boolean type can hold TRUE, FALSE or NULL
  • it can also do smart conversions
    • 1, yes, y , t, true => TRUE
    • 0, no, f false => FALSE

⬆ back to top

150. Storing Text

Character

  • postgres provides 3 character data types: CHAR(N), VARCHAR(N) and TEXT
  • CHAR(N): fixed length with space padding
  • VARCHAR(N): variable length with no padding
  • TEXT: unlimited length text
CREATE TABLE test_text (
  fixed char(4),
  variable varchar(20),
  unlimited text
);

INSERT INTO test_text VALUES ('mo', 'mo', 'I have unlimited space');

SELECT * FROM test_text;

⬆ back to top

151. Storing Numbers

  • there are 2 types of number in postgres: integers and floating points
  • integers: a whole number, a number that is not a fraction
  • 3 types of integers: SMALLINT, INT, BIGINT
  • SMALLINT: number that can range from -32,768 to 32,767
  • INT: number that can range from -2,147,483,648 to 2,147,483,647
  • BIGINT: number that can range from -9223372036854775808 to 9223372036854775807
  • floating point: number that can contain a decimal point
  • FLOAT4: single or 6 digits precision
    • 1.123456789 => 1.1234568 (reserve 6 digits and round off from 7th digits)
  • FLOAT8: double or 15 digits precision
    • 1.12345678901234567 => 1.1234567890123457 (reserve 15 digits and round off from 16th digits)
  • DECIMAL / NUMBERIC: up to 131072 digits before the decimal point; up to 16383 digits after the demical point
CREATE TABLE test_float (
  four float4,
  eight float8,
  big decimal
);

INSERT INTO test_float VALUES (1.123456789, 1.12345678901234567, 1.12345678901234567);

SELECT * FROM test_float;
four eight big
1.1234568 1.1234567890123457 1.12345678901234567

⬆ back to top

152. Storing Arrays

  • an array is a group of elements of the same type
  • an array is denoted by a bracket syntax
  • every data type in postgres has an array equivalent
  • Example: store phone numbers
CREATE TABLE test_array (
  four char(2)[],
  eight text[],
  big float4[]
);

INSERT INTO test_array VALUES (ARRAY ['mo', 'm', 'm', 'd'], ARRAY ['test', 'long text', 'longer text'], ARRAY [1.23, 2.11, 3.23, 5.32]);

SELECT * FROM test_array;
four eight big
["mo","m ","m ","d "] ["test","long text","longer text"] [1.23,2.11,3.23,5.32]

⬆ back to top

153. Data Models And Naming Conventions

Database Models

  • create a model before we create a database

What is a model?

  • a model is a design that is used to visualize what we are going to build

Naming conventions

  • table names must be singular!
  • columns must be lowercase with underscores
  • columns with mixed case are acceptable
  • columns with upper case are unacceptable
  • be consistence. write down your rules

⬆ back to top

154. CREATE TABLE

CREATE TABLE <name> (
  <col1> TYPE [CONSTRAINT],
  table_constraint [CONSTRAINT]
) [INHERITS <existing_tables>];
  • Create Student table
CREATE TABLE student (
  student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);
psql -U postgres ztm

ztm=# \conninfo
You are connected to database "ztm" as user "postgres" via socket in "/tmp" at port "5432".

ztm=# CREATE TABLE student (
ztm(#   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(#   first_name VARCHAR(255) NOT NULL,
ztm(#   last_name VARCHAR(255) NOT NULL,
ztm(#   email VARCHAR(255) NOT NULL,
ztm(#   date_of_birth DATE NOT NULL
ztm(# );
ERROR:  function uuid_generate_v4() does not exist
LINE 2:   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
                                              ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

ztm=# CREATE extension if not exists "uuid-ossp";
CREATE EXTENSION

ztm=# CREATE TABLE student (
ztm(#   student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
ztm(#   first_name VARCHAR(255) NOT NULL,
ztm(#   last_name VARCHAR(255) NOT NULL,
ztm(#   email VARCHAR(255) NOT NULL,
ztm(#   date_of_birth DATE NOT NULL
ztm(# );
CREATE TABLE

ztm=# \dt
          List of relations
 Schema |  Name   | Type  |  Owner   
--------+---------+-------+----------
 public | student | table | postgres
(1 row)

ztm=# \d student
                               Table "public.student"
    Column     |          Type          | Collation | Nullable |      Default       
---------------+------------------------+-----------+----------+--------------------
 student_id    | uuid                   |           | not null | uuid_generate_v4()
 first_name    | character varying(255) |           | not null | 
 last_name     | character varying(255) |           | not null | 
 email         | character varying(255) |           | not null | 
 date_of_birth | date                   |           | not null | 
Indexes:
    "student_pkey" PRIMARY KEY, btree (student_id)

⬆ back to top

155. Extra information on CREATE TABLE

Temporary tables

  • They are a type of table that exist in a special schema, so you cannot define a schema name when declaring a temporary table.
  • These types of tables will be dropped at the end of your session. It’s important to also note that they are only visible to the creator.
  • Now you may be wondering, why would I ever use these? Well if you’re writing intensive queries against a data set it might be beneficial to temporarily create a table based off another table. This is because:
    • Temporary tables behave just like normal ones
    • Postgres will apply less “rules” (logging, transaction locking, etc.) to temporary tables so they execute more quickly
    • You have full access rights to the data, if you otherwise didn’t so you can test things out.
CREATE TEMPORARY TABLE <name> (<columns>);

⬆ back to top

156. Column Constraints

Constraints

  • Constraints are a tool to apply validation methods against data that will be inserted

Column Constraints

  • A column constraints is defined as part of a column definition

Examples

  • NOT NULL: cannot be NULL
  • PRIMARY KEY: column will be primary key
  • UNIQUE: can only contain unique values (NULL is unique)
  • CHECK: apply a special condition check against the values in the column
  • REFERENCES: constraints the values of the column to only values that exist in the column of another table (Foreign Key)

⬆ back to top

157. Table Constraints

Table Constraints

  • a table constraint definition is not tied to a particular column, and it can encompass more than one column

Examples

  • UNIQUE(column_list): can only contain unique values (NULL is unique)
  • PRIMARY KEY(column_list): columns that will be the primary key
  • CHECK(condition): a condition to check when insert or update
  • REFERENCES: Foreign Key relationship to a column
  • every column constraint can be written as a table constraint
  • use column constraint when involve one column only
  • use table constraint when involve many columns
CREATE TABLE student (
  student_id UUID DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL,
  CONSTRAINT pk_student_id PRIMARY KEY (student_id)
);
CREATE TABLE category (
  cat_id SMALLINT PRIMARY KEY,
  type TEXT
);

CREATE TABLE column_constraints (
  cc_id SMALLINT PRIMARY KEY,
  something TEXT NOT NULL,
  email TEXT CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),
  cat_id SMALLINT REFERENCES category(cat_id)
);

CREATE TABLE table_constraints (
  cc_id SMALLINT,
  something TEXT NOT NULL,
  email TEXT,
  cat_id SMALLINT REFERENCES category(cat_id),
    CONSTRAINT pk_table_constraints PRIMARY KEY (cc_id),
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
);

INSERT INTO category VALUES (1, 'category 1');
INSERT INTO column_constraints VALUES (1, 'something', '[email protected]', 1);
INSERT INTO table_constraints VALUES (1, 'something', '[email protected]', 1);

SELECT * FROM category;
SELECT * FROM table_constraints;
SELECT * FROM column_constraints;
  • Query Error when insert wrong email
INSERT INTO column_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "column_constraints" violates check constraint "column_constraints_email_check"

INSERT INTO table_constraints VALUES (1, 'something', '@email.com', 1);
Query Error: error: new row for relation "table_constraints" violates check constraint "valid_email"

⬆ back to top

158. Regexes!

⬆ back to top

159. UUID Explained

  • need to install an extension
  • extension are pieces of software to expand what postgres can do or expand how certain processes run
  • UUID extension allows us to generate unique identifiers for PRIMARY KEY
  • UUID stands for universally unique identifier
CREATE extension if not exists "uuid-ossp";
PROS CONS
unique everywhere large values to store
easy to shard can have performance impact
easy to merge / replicate more difficult to debug
expose less info about your system

⬆ back to top

160. Custom Data Types And Domains

Custom Data Types

  • Feedback custom data type
  • postgres allow to create custom data types to store shapes of data that are more complex

Domains

  • domain is a alias for an exitsing type that can have a check constraint
CREATE DOMAIN Rating SMALLINT 
  CHECK (VALUE > 0 AND VALUE <= 5);

CREATE TYPE Feedback AS (
  student_id UUID,
  rating Rating,
  feedback TEXT
);

⬆ back to top

161. Creating The Tables For ZTM

CREATE TABLE student (
  student_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);

CREATE TABLE subject (
  subject_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  subject TEXT NOT NULL,
  description TEXT
);

CREATE TABLE teacher (
  teacher_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  date_of_birth DATE NOT NULL
);

CREATE TABLE course (
  course_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  "name" TEXT NOT NULL,
  description TEXT,
  subject_id UUID REFERENCES subject(subject_id),
  teacher_id UUID REFERENCES teacher(teacher_id),
  feedback feedback[]
);

CREATE TABLE enrollment (
  course_id UUID REFERENCES course(course_id),
  student_id UUID REFERENCES student(student_id),
  enrollment_date DATE NOT NULL,
  CONSTRAINT pk_enrollment PRIMARY KEY (course_id, student_id)
);
ALTER TABLE student
ADD COLUMN email VARCHAR(255) NOT NULL;

⬆ back to top

162. Extra information on ALTER TABLE

ALTER TABLE

  • Change the SCHEMA of a table
  • Change the name of a table-level CONSTRAINT
  • Add and remove constraints
  • Change a column level constraint
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  ADD COLUMN <col> <type> <constraint>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  ALTER COLUMN <name> TYPE <new type> [USING <expression>]
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  RENAME COLUMN <old name> TO <new name>
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
  DROP COLUMN <col> [ RESTRICT | CASCADE ]

Our main advice when using this command is to make sure you communicate changes early and set in place a plan for other systems/software to be able to migrate, the last thing you want to happen is changing a column and suddenly all your services start breaking!

⬆ back to top

163. Adding Students And Teachers

INSERT

INSERT INTO student (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', '[email protected]', '1992-11-13'::DATE);

INSERT INTO teacher (first_name, last_name, email, date_of_birth) VALUES ( 'Mo', 'Binni', '[email protected]', '1992-11-13'::DATE);

⬆ back to top

164. Creating A Course

INSERT INTO subject (subject, description) VALUES ('SQL', 'A database management language');

DELETE FROM subject WHERE subject = 'SQL';

INSERT INTO course ("name", description) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery');

UPDATE course
SET subject_id = 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e'
WHERE subject_id is NULL;

ALTER TABLE course 
ALTER COLUMN subject_id SET NOT NULL; 

UPDATE course
SET teacher_id = '90de5068-bc2f-4d37-b4f0-1364671fb89e'
WHERE teacher_id is NULL;

ALTER TABLE course 
ALTER COLUMN teacher_id SET NOT NULL; 

INSERT INTO course ("name", description, subject_id, teacher_id) VALUES ('SQL Zero to Mastery', 'The #1 resource for for SQL mastery', 'e4a6e53f-db6f-438f-b4d0-80d4e6a9341e', '90de5068-bc2f-4d37-b4f0-1364671fb89e');

⬆ back to top

165. Adding Feedback To A Course

  • problem is there is not data integrity check for student_id when append feedback object into feedback array
  • if we need to check student_id is valid in student table, we should use a feedback table instead of feedback array
INSERT INTO enrollment (student_id, course_id, enrollment_date) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', NOW()::DATE);

UPDATE course
SET feedback = array_append(
  feedback, ROW('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', 5, 'Great course!')::feedback
)
WHERE course_id = '6174589a-7f97-4d40-b063-6a974a20a7ec';

⬆ back to top

166. A Tale Of 2 Feedbacks

  • Feedback is important
  • Store feedback in a table instead of feedback array data type
  • Deprecated feedback array data type by rename feedback to feedback_deprecated
  • Deprecated feedback field in course by rename feedback to feedback_deprecated
  • Use array data type for not so important data such as phone numbers

CREATE TABLE feedback (
  student_id UUID NOT NULL REFERENCES student(student_id),
  course_id UUID NOT NULL REFERENCES course(course_id),
  feedback TEXT,
  rating rating,
  CONSTRAINT pk_feedback PRIMARY KEY (student_id, course_id)
);

INSERT INTO feedback (student_id, course_id, feedback, rating) VALUES ('f914ba46-9ac4-42cf-b0ac-d10cfe625be8', '6174589a-7f97-4d40-b063-6a974a20a7ec', 'Interesting Course', 4)

⬆ back to top

167. SQL Exercises

SQL Exercises

⬆ back to top

168. SQL Quiz

SQL Quiz

⬆ back to top

169. Backups And Why They Are Important

  • disaster will strike
  • so it's important

Have a plan

  • backup plan
  • disaster recovery plan
  • test your plans

What can go wrong?

  • hardware failures
  • viruses
  • power outages
  • hackers
  • human error

How do I make a plan?

  • 1 determine what needs to be backed up
  • 2 What's the appropriate way to back up?
  • 3 Decide how frequently you are going to back up
  • 4 decide where you are going to store them
  • 5 have a retention policy for the backup

What to backup?

Type Purpose Frequency
full backup backup all the data less often
incremental backup since last incremental often
differential backup since last full backup often
transaction log backup of the database transactions a lot

⬆ back to top

170. Backing Up In Postgres

⬆ back to top

171. Restoring A Database

psql -d postgres
postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/tmp" at port "5432".
postgres=# \i ~/Desktop/Employees_20200927_153756.sql

⬆ back to top

172. Transactions

Transactions

  • a database is a shared resource, so many users may access it concurrently
  • a transaction is a unit of instructions (query for instance)
  • so how do you keep things consistent?
  • dbms has a mechanism in place to manage transactions
  • dbms wrap INSERT, UPDATE and DELETE commands in transaction by default
  • Transaction lifecycle

  • session 1
Employees=# BEGIN;
BEGIN

Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5

Employees=# SELECT * FROM employees;
 emp_no | birth_date |   first_name   |    last_name     | gender | hire_date  
--------+------------+----------------+------------------+--------+------------
  10006 | 1953-04-20 | Anneke         | Preusig          | F      | 1989-06-02
  10007 | 1957-05-23 | Tzvetan        | Zielinski        | F      | 1989-02-10
  10008 | 1958-02-19 | Saniya         | Kalloufi         | M      | 1994-09-15
  10009 | 1952-04-19 | Sumant         | Peac             | F      | 1985-02-18
  10010 | 1963-06-01 | Duangkaew      | Piveteau         | F      | 1989-08-24
  • session 2
  • session 2 hang when DELETE as session 1 lock employees table
Employees=# SELECT * FROM employees;
 emp_no | birth_date |   first_name   |    last_name     | gender | hire_date  
--------+------------+----------------+------------------+--------+------------
  10001 | 1953-09-02 | Georgi         | Facello          | M      | 1986-06-26
  10002 | 1964-06-02 | Bezalel        | Simmel           | F      | 1985-11-21
  10003 | 1959-12-03 | Parto          | Bamford          | M      | 1986-08-28
  10004 | 1954-05-01 | Chirstian      | Koblick          | M      | 1986-12-01
  10005 | 1955-01-21 | Kyoichi        | Maliniak         | M      | 1989-09-12
  10006 | 1953-04-20 | Anneke         | Preusig          | F      | 1989-06-02
  10007 | 1957-05-23 | Tzvetan        | Zielinski        | F      | 1989-02-10
  10008 | 1958-02-19 | Saniya         | Kalloufi         | M      | 1994-09-15
  10009 | 1952-04-19 | Sumant         | Peac             | F      | 1985-02-18
  10010 | 1963-06-01 | Duangkaew      | Piveteau         | F      | 1989-08-24

Employees=# BEGIN;
BEGIN

Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
  • session 1 rollback
Employees=# ROLLBACK;
  • session 2 complete after session 1 rollback
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
  • session 1 begin another transaction and try to DELETE
  • session 1 hang now as session 2 locked employees table
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
  • session 2 commit and end transaction
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 5
Employees=# END;
COMMIT
  • session 1 delete no records as session 2 commit
Employees=# BEGIN;
BEGIN
Employees=# DELETE FROM employees WHERE emp_no BETWEEN 10000 AND 10005;
DELETE 0
Employees=# END;
COMMIT

Transactions

  • Maintain the integrity of a database, all transactions must obey ACID properties
  • Atomicty: either execute entirely or not at all
  • Consistency: each transaction should leave the database in a consistent state (commit or rollback)
  • Isolation: transaction should be executed in isolation from other transactions
  • Durability: after completion of a transaction, the changes in the database should persist

⬆ back to top

Section 8: Solving The Mystery

173. Clue #1 and #2

  • Clue #1: Bruno insists it's one of his employees
  • Clue #2: Data breach happened during a certain time

⬆ back to top

174. Getting ready to solve the Mystery!

We have two solid clues so far:

  • It was an inside job
  • It was done on 2020-06-23

For this part of the course you'll need 2 databases:

  • Movr_Employees: This database contains all relevant information about the Employees of Keiko Corp.
  • Movr: A ride-sharing product that Keiko offers to consumers.

To restore the Movr backup make sure to do the following steps in order!

  • Create a database named Movr
  • Restore the schema.sql first to the Movr database
  • Restore the data.sql next
psql -U postgres Movr
Movr=# \i ~/Desktop/schema.sql
Movr=# \i ~/Desktop/data.sql

To restore the Movr_Employees backup make sure to do the following steps in order!

  • Create a database named Movr_Employees
  • Restore the movr_employees.sql
psql -U postgres Movr_Employees
Movr_Employees=# \i ~/Desktop/movr_employees.sql

⬆ back to top

175. Clue #3

You had to be on location to steal the data

⬆ back to top

176. Exercise: Solving The First Clues

  • Date of incident: 2020-06-23
  • Keiko Corp Latitude: -74.997 to -74.9968
  • Keiko Corp Longitude: 40.5 to 40.6

⬆ back to top

177. Solution: Solving The First Clues

CREATE VIEW suspected_rides AS
SELECT *
FROM "public"."vehicle_location_histories"
WHERE 
  "city" = 'new york' AND
	"timestamp"::date = '2020-06-23'::date AND
	"lat" BETWEEN -74.997 AND -74.9968 AND
	"long" BETWEEN 40.5 AND 40.6
ORDER BY long;

SELECT DISTINCT r.vehicle_id
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id; 

SELECT 
  DISTINCT r.vehicle_id, 
  u.name AS "owner name", 
  u.address, 
  v.status, 
  v.current_location
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN vehicles AS v ON v.id = r.vehicle_id
JOIN users AS u ON u.id = v.owner_id; 

⬆ back to top

178. Clue #4

It's not the drivers

⬆ back to top

179. Exercise: Clue #4

So with our current setup, we go ahead and filter out all of the unique riders that were on those suspected rides on that horrible day of the theft.

⬆ back to top

180. Solution: Clue #4

SELECT DISTINCT r.vehicle_id, u.name AS "rider name", u.address
FROM suspected_rides AS sr
JOIN rides AS r ON r.id = sr.ride_id
JOIN users AS u ON u.id = r.rider_id; 

⬆ back to top

181. Clue #5 and #6

  • Clue #5: It's not a rider
  • Clue #6: It was an inside job

⬆ back to top

182. Exercise: Clue #5 and #6

We're going to have to cross-reference data between 2 separate databases.

-- run this to run queries across databases
CREATE extension dblink; 

SELECT *
FROM dblink(
  'host=localhost user=postgres password=postgres dbname=Employees', 
  'SELECT <column> FROM employees;') 
AS t1(<column> NAME) 

⬆ back to top

183. Solution: Clue #5 and #6

CREATE VIEW suspected_rider_names AS
SELECT DISTINCT
    split_part(u.name, ' ', 1) AS "first_name",
    split_part(u.name, ' ', 2) AS "last_name"
FROM suspected_rides AS vlh
JOIN rides AS r ON r.id = vlh.ride_id
JOIN users AS u ON u.id = r.rider_id; 

SELECT * FROM suspected_rider_names;

SELECT DISTINCT
  CONCAT(t1.first_name, ' ', t1.last_name) AS "employee",
  CONCAT(u.first_name, ' ', u.last_name) AS "rider"
FROM dblink(
  'host=localhost user=chesterheng password=postgres dbname=Movr_Employees', 
  'SELECT first_name, last_name FROM employees;') 
AS t1(first_name NAME, last_name NAME)
JOIN suspected_rider_names AS u ON t1.last_name = u.last_name
ORDER BY "rider";

⬆ back to top

Section 9: Database Design

185. System Design And SDLC

Software development lifecycle

  • Phase 1: System planning and selection
  • Phase 2: System analysis
  • Phase 3: System design
  • Phase 4: System implementation and operation
  • Goal of SDLC is to design robust systems
  • The process can be implemented in different ways: agile, waterfall, v-model (for hardware system)

⬆ back to top

186. SDLC Phases

  • different models cycle through the steps at different paces
  • Phase 1: requirements analysis
    • get the information we needs to be done (scope)
  • Phase 2: system analysis
    • take in requirements and analyze if it can be done on time and on budget
  • Phase 3: system design
    • design the system architecture for all related databasesm=, apps, etc
  • Phase 4: system implementation and operation
    • build the software
  • there are more phases that can be added: testing, maintenance, ... (inside phase 4)

⬆ back to top

187. System Design Deep Dive

  • From chaos to structure: create structure that can be understood and communicated

Database design

  • In any system, there is data
  • There are different techniques to design database
  • Top-Down vs Bottom-Up

⬆ back to top

188. Top-Down vs Bottom-Up

Top-Down

  • Start from 0
  • Optimal choice when create a new database
  • All requirements are gathered up-front

Bottom-Up

  • There is an exitsing system or specific data in place (exmaple: use by doctore)
  • Shape a new system around existing data
  • Optimal choice when migrate an exitsing database

What to use? Often you'll end up using a bit of both

⬆ back to top

189. DRIVEME Academy

  • In top-down design, you create the universe based on the requirements
  • So, let'd do based on DRIVEME, an driving school
  • Every school has instructors on payroll and an inventory of cars, truck and motorcycles for teaching
  • Initial stake holders: CEO, CTO, BA's, customers
  • Core mission: become a household name across USA for learning how to drive
  • Current situation: has an outdated website and their customer acquistion is mostly word of mouth
  • THey want to start gaining marketshare through an online presence
  • Core requirements are a well-thought out document with sections of functionality
    • There is a vehicle inventory for students to rent
    • Thare are employees at every branch
    • There is maintenance for the vehicles
    • There is an optional exam at the end of lessons
    • You can only take the exam twice, fail twice and you must take more lessons

⬆ back to top

190. Top Down Design

  • Goal: create a data model based on requirements
  • What are requirements?
    • high-level requirements
    • user interviews
    • data collection
    • deep understanding of the business
  • one of the key method to top-down deisgn is ER modeling

⬆ back to top

191. ER Model

Phase 3: system design

  • How to design?
  • Top-down: ER modeling
  • Bottom-up: ???

ER Model: Entity-Relationship Model or diagram

  • A diagram that functions as a way to structure high-level requirments

⬆ back to top

192. Step 1: Determining Entities

  • Step 1: Determine what entities are in the system

What is an entity?

  • a person / place or a thing
  • has a singular name
  • has an identifier
  • should contain more than one instance of data

⬆ back to top

193. Tooling For Diagramming

⬆ back to top

194. DRIVEME Academy Entities

Core requirements

  • There is a vehicle inventory for students to rent
  • Thare are employees at every branch
  • There is maintenance for the vehicles
  • There is an optional exam at the end of lessons
  • You can only take the exam twice, fail twice and you must take more lessons

⬆ back to top

195. Step 2: Attributes

  • give entities the information they will store

Attributes

  • must be a property of the entity
  • must be atomic
    • can be store in an individual field
    • not address
    • address can breakdown into street name, postal code, region
  • single / mutivalued (single or many phone numbers)
  • keys

⬆ back to top

196. Relational Model Extended

  • relation schema
  • attribute
  • degree
  • cardinality
  • tuple
  • column
  • relation key
  • domain
  • tables
  • relation instance

⬆ back to top

197. Relational Schema And Instance

Relation schema

  • table schema
  • represent data that is going into table

Relation instance

  • set of data related to relation schema

⬆ back to top

198. Super Key and Candidate Key

⬆ back to top

199. Primary Key and Foreign Key

  • primary key: uniquely identify a row
  • foreign key: reference a key in foreign table

⬆ back to top

200. Compound Composite And Surrogate Key

  • compound key: multiple keys to uniquely identify a row and include foreign key
  • composite key: multiple keys to uniquely identify a row without include foreign key
  • surrogate key: use to uniquely identify a row and has no meaning to the data in the table

⬆ back to top

201. DRIVEME Attributes

School:

  • school_id, street_name, street_nr, postal_code, state, city

Instructor:

  • instructor_id, first_name, last_name, date_of_birth, hiring_date, school_id

Student:

  • student_id, first_name, last_name, date_of_birth, enrollment_date, ...

Exam:

  • student_id, teacher_id, date_taken, passed, lesson_id

Lesson:

  • lesson_id, date_of_enrollment, package, student_id

⬆ back to top

202. Step 3: Relationships

  • determine relationships between entities
  • links 2 entities together
    • 1 to 1
    • 1 to many
    • many to many
  • connect entities with a line that tells is the relationship

⬆ back to top

203. DRIVEME Relationships

⬆ back to top

204. Step 4: Solving Many To Many

  • in relational model it isn't possible to store a many to many relationship
  • technially you can do it, but you really don't want to
  • you create more overhead: INSERT overhead, UPDATE overhead, DELETE overhead, potential redundancy
  • as a rule of thumb always try to resolve many to many
  • solution is use intermediate entities

⬆ back to top

205. Step 5: Subject Areas

  • divide entities into logical groups that are related (think schemas)
  • Subject Areas rules
    • all entities must belong to one
    • an entity can only belong to one
    • you can nest subject areas

⬆ back to top

206. DRIVEME Subject Areas

⬆ back to top

207. Exercise: Painting Reservations

System requirements

  • a rich business man has tons of paintings
  • he wants to build a system to catalog and track where his art is
  • he lends it to museums all across the world
  • he even wants to see reservations

Some constraints to note:

  • a painting can only have one artist

Let's answer some questions about the system:

  • what's the goal of the system?
    • track painting reservations for a wealthy man
  • who are our stakeholders
    • owner, museums

5 steps of top-down to create an erd of the system

Step 1: entities

  • Painting
  • Reservation
  • Museum
  • Artist

Step 2: attributes

Entity Attributes
Painting name, creation_date, style
Reservation creation_date, date_from, date_to, accepted
Artist name, birth_date, email
Museum name, address, phone_nr, email

Step 3: Relationships

Step 4: Solving Many To Many

⬆ back to top

208. Exercise: Movie Theatre

⬆ back to top

209. Bottom Up Design

  • to create a data model from specific details, existing systems, legacy systems, ...

Steps

  • identify the data (attributes)
  • group them (entities)

Goal

  • create a perfect data model without redundancy and anomalies

⬆ back to top

210. Anomalies

  • modification anomalies are problems that arise when database is not structured correctly

There are 3 types

  • Insert anomalies
  • Update anomalies
  • Delete anomalies

⬆ back to top

211. Normalization

  • use Normalization is key to avoid anomalies
  • Edgar Codd proposed the theory of normalization

Goal

  • a design technique to reduce redundancy and anomalies

Normalization

  • to do Normalization we must understand 2 things
    • functional dependencies
    • normal forms

⬆ back to top

212. Functional Dependencies

  • A functional Dependency show a relationship between attributes
  • functional Dependency exists when a relationship between 2 attributes allows you to uniquely determine corresponding attribute's value
  • we refer to an entity as R and attributes as A and B
  • A is functional dependent on B when a value of B determines a value of A: B -> A
  • it means that based on the value of B, you can determine the value of A: EMP_NO -> FIRST_NAME
  • DETERMINANT -> DEPENDANT
  • example: STUIDENT_ID -> BIRTH_DATE
  • create unique relationship between the data

⬆ back to top

213. Functional Dependencies 2

emp_no proj_id salary
1 1 300
2 1 500
1 2 400
3 2 300

Example:

EMP_NO -> SALARY (no) PROJ_ID -> SALARY (no) PROJ_ID, EMP_NO -> SALARY (yes)

student_no activity contribution
1 swimming 300
2 sailing 500
3 bungee 150
4 shark diving 200
1 shark diving 300
2 bungee 300

Example:

STUDENT_NO -> ACTIVITY (no) ACTIVITY -> STUDENT_NO (no) ACTIVITY -> CONTRIBUTION (no) STUDENT_NO, ACTIVITY -> CONTRIBUTION (yes)

⬆ back to top

214. The Normal Forms

  • 0 NF -> 1 NF -> 2 NF -> 3 NF -> BCNF -> 4 NF -> 5 NF -> 6 NF
  • normalization happens through a process of running attributes through the normal forms
  • each normal form aims to further seperate relationships into smaller instances as to create less redundancy and anomalies
  • normal forms 0 through BCNF are the most common normal forms to run through
  • NF 4 and 5 are there to further reduce anomalies and 6 NF is not yet standardized

⬆ back to top

215. Going from 0NF to 1NF

data is in 0 NF when it is unnormalized

  • repeating groups of fields
  • positional dependence of data
  • non-atomic data

1 NF

  • eliminate repeating columns of the same data
  • each attribute should contain a single value
  • determine a primary key

A repeating group means that a table contains 2 or more columns that are closely related

Example 1:

O NF

book author 1 author 2 author 3
1 1 2 3
2 2 1 3
3 3 2 1

1 NF

BOOK

book_id title

BOOK_AUTHOR

book_id author_id author_name author_address author_email

Example 2:

O NF

color quantity price
red, green, blue 20 9.99
yellow, orange, purple 10 10.99
blue, cyan 15 3.99
green, magento 200 15.99

1 NF

PRODUCT

prod_id quantity price

PRODUCT_COLOUR

prod_id color

⬆ back to top

216. Going from 1NF to 2NF

  • it is in 1 NF
  • all non-key attributes are fully functional dependent on the primary key

Example:

O NF

  • book
  • title
  • author
  • author_name
  • author_address
  • author_email

1 NF

BOOK

  • book_id (pk)
  • title

BOOK_AUTHOR

  • book_id (pk)
  • author_id (pk)
  • author_name
  • author_address
  • author_email

2 NF

BOOK

  • book_id (pk)
  • title

BOOK_AUTHOR

  • book_id (pk / fk)
  • author_id (pk / fk)

AUTHOR

  • author_id (pk)
  • author_name
  • author_address
  • author_email

⬆ back to top

217. Going from 2NF to 3NF

  • It is in 2 NF
  • no transitive dependencies
  • A is functional dependent on B and B is functional dependent on C, C is transitively dependent on A via B
  • C -> B and B -> A means that C -> A

Example:

O NF

  • branch
  • first name
  • last name
  • title
  • hours

1 NF

EMPLOYEE

  • emp_no (pk)
  • first name
  • last name
  • title

BRANCH

  • branch_no (pk)
  • emp_no (fk)
  • street
  • street_no
  • province
  • postal_code
  • hours_logged
  • work_date
  • country

2 NF

EMPLOYEE

  • emp_no (pk)
  • first name
  • last name
  • title

BRANCH

  • branch_no (pk)
  • street
  • street_no
  • province
  • postal_code
  • country

TIMESHEET

  • branch_no (pk / fk)
  • emp_no (pk / fk)
  • hours_logged
  • work_date

3 NF

EMPLOYEE

  • emp_no (pk)
  • first name
  • last name
  • title

BRANCH

  • branch_no (pk)
  • street
  • street_no
  • postal_code
  • province_id (fk)

PROVINCE

  • province_id (pk)
  • province
  • country

TIMESHEET

  • branch_no (pk / fk)
  • emp_no (pk / fk)
  • hours_logged
  • work_date

⬆ back to top

218. Boyce-Codd Normal Form

  • it is in 3 NF
  • for any dependency A -> B, A should be a super key
  • most relationship in 3 NF are also in BCNF, but not all of them
  • 3 NF allows attributes to be part of a candidate key that is not the primary key - BCNF does not

A relationship is not in BCNF if

  • the primary key is a composite key
  • there is more than one candidate key
  • some attributes have keys in common

3 NF

student_id tutor_id tutor_sin
1 998 838 383 494
2 234 343 535 352
3 999 112 134 134
4 1234 354 464 234

candidate keys

  • STUDENT_ID, TUTOR_ID
  • STUDENT_ID, TUTOR_SIN

functional dependency

  • TUTOR_ID -> TUTOR_SIN
  • TUTOR_SIN -> TUTOR_ID
  • STUDENT_ID, TUTOR_ID -> TUTOR_SIN
  • STUDENT_ID, TUTOR_SIN -> TUTOR_ID

BCNF

student_id tutor_id
1 998
2 234
3 999
4 1234
tutor_id tutor_sin
999 838 383 494
234 343 535 352
999 112 134 134
1234 354 464 234

⬆ back to top

219. Why 4NF And 5NF Are Not Useful

  • over normalization
  • incur functionality loss

⬆ back to top

220. Exercise: Database Design Quiz

Database Design Quiz

⬆ back to top

Section 10: Database Landscape, Performance and Security

222. Scalability

Scalability

  • capability of database to handle growing amount of data
  • more data to store into database
  • reach limit on how much data to store
  • more operations - lots of users query database for more information
  • how many operations per second

2 strategies

  • vertical scalability: add more capacity to a single database, add more disk space, add more memory, add physically device, normally for relation database
  • horizontal scalability: add more database and worked as single database, normally for NoSQL such as MongoDB

⬆ back to top

223. Sharding

  • split data into many databases
  • for example: 3 database
  • 1st database store A to J
  • 2nd database store K to T
  • 3rd database store U to z
  • need a routing system to know data is located in which database
  • distribute query to different database

⬆ back to top

224. Replication

  • replicate data across mutiple machines and different locations
  • happen in real time
  • eventual consistency: user write data in one database and database updates all its replicas

2 strategies

  • synchronous: client update a database, database updates all its replicas before response to the client
  • asynchronous: client update a database and response to the client and updates all its replicas in the background

⬆ back to top

225. Backups

  • not done often
  • normally at night
  • dump data into a sql file "database backup "

⬆ back to top

226. Distributed vs Centralized Databases

  • Centralized: keep in single location on a given network or control by one company, centrally control
    • data integrity is more easy
  • Distributed: compose of multiple databadr store in multiple locations or controlled by different organizations
    • more complexity

⬆ back to top

227. Database Security

  • users can only see the data they are authorized to see
  • keep unauthorized uses from access the database
  • prevent data corruption
  • database availability, no system crashes
  • create database to store necessary info and give access to necessary users
  • principle of least privilege state to give privilege to exactly what is needed

⬆ back to top

232. Relational vs NoSQL, PostgreSQL vs MongoDB Databases

Big drop in developer job postings

  • NoSQL (MongoDB)

    • duplicated data
    • no standard way to query
    • no need to have schema
    • easy to scale horizontally
    • related data in one locaton
    • data is not linked to other location
  • Relational (PostgreSQL)

    • normalisation, no data duplication
    • standard way to query
    • need a schema
    • hard to scale horizontally
    • need to get related data from different tables
    • data is linked to many locations

⬆ back to top

233. Future Of Relational Databases

NewSQL

  • relational database and easy to scale horizontally

⬆ back to top

234. Elasticsearch

  • good for data need to search
  • build for full text search
  • less reliable than relational database
  • fast in search
  • relational database as primary data source
  • relational database to store important info
  • elasticsearch database as secondary data source
  • copy data to search to elasticsearch database

⬆ back to top

235. S3 Object Storage

  • document model
  • use for object
  • dump data and retrieve data

⬆ back to top

236. Top Databases To Use

Databases

  • Postgres
  • Microsoft SQL Server
  • MongoDB
  • Amazon DocumentDB
  • Firebase
  • Elasticsearch
  • Redis
  • Amazon S3

⬆ back to top

Section 14: Extra Bits: Databases In A Web App (with Node.js)

265. Setting Up Your Database

CREATE TABLE users (
  id serial PRIMARY KEY,
  "name" VARCHAR(100),
  email text UNIQUE NOT NULL,
  entries BIGINT DEFAULT 0,
  joined TIMESTAMP NOT NULL
);

CREATE TABLE login (
  id serial PRIMARY KEY,
  hash VARCHAR(100) NOT NULL,
  email text UNIQUE NOT NULL
);

⬆ back to top

266. Connecting To The Database

Knex.js

const postgres = knex({
  client: 'pg',
  connection: {
    host : 'localhost',
    user : 'postgres',
    password : '',
    database : 'smart-brain'
  }
});

postgres.select('*').from('users');

⬆ back to top

267. Registering A User Part 1

app.post('/register', (req, res) => {
  const { email, name } = req.body;
  db('users')
    .returning('*')
    .insert({
      name,
      email,
      joined: new Date()
  })
  .then(user => res.json(user[0]))
  .catch(err => res.status(400).json('unable to register'));
});

⬆ back to top