-
Notifications
You must be signed in to change notification settings - Fork 0
/
Balaguer Daniel DA03_4.sql
205 lines (170 loc) · 11.5 KB
/
Balaguer Daniel DA03_4.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
/*
Question #1: What are the unique states values available in the customer data?
Count the number of customers associated to each state.
Expected columns: state, total_customers
*/
-- q1 solution : --solution with a simple query, can be done with temp table but i choose to keep it simple
select distinct (state), count (distinct customer_id)
from customers
group by distinct (state);
/*
Question #2: It looks like the state data is not 100% clean and your manager already one issue:
“US State” customers should be assigned to California.
What is the total number of orders that have been completed for every state?
Only include orders for which customer data is available.
Expected columns: clean_state, total_completed_orders
*/
-- q2 solution:
--temp table to assign data from US State to California
with cleaned_state as (select *, case when state in ('US State') then 'California'
else state end as clean_state
from customers),
--temp table to get the complete orders joining cleaned states
orders_completed as (select * from orders o join cleaned_state cs
on o.user_id = cs.customer_id where status in ('Complete'))
--main table to get complete orders by cleaned states excluding Customer Null values
select clean_state, count (distinct (order_id)) as total_complete_orders
from orders_completed
where customer_id is not null
group by clean_state;
/*
Question #3: After excluding some orders since the customer information was not available,
your manager gets back to and stresses what we can never presented a number that is missing any
orders even if our customer data is bad.
What is the total number of orders, number of Nike Official orders, and number of Nike Vintage
orders that are completed by every state?
If customer data is missing, you can assign the records to ‘Missing Data’.
Expected columns: clean_state, total_completed_orders, official_completed_orders,
vintage_completed_orders
*/
-- q3 solution:
--temp table to assign data from US State to California
with cleaned_state as (select *, case when state in ('US State') then 'California'
else state end as clean_state from customers),
--temp table to get the complete orders joining cleaned states
orders_completed as (select * from orders o
left join cleaned_state cs on o.user_id = cs.customer_id
where status in ('Complete'))
--main table to do the calculations required grouped x clean_state
select coalesce (clean_state, 'Missing Data') as clean_state,
count (distinct oc.order_id) as total_completed_orders,
count (distinct oi.order_id) as official_completed_orders,
count (distinct oiv.order_id) as vintage_completed_orders
from orders_completed oc left join order_items oi on oc.order_id = oi.order_id
left join order_items_vintage oiv on oc.order_id = oiv.order_id
group by clean_state order by clean_state asc;
/*
Question #4: When reviewing sales performance, there is one metric we can never forget; revenue.
Reuse the query you created in question 3 and add the revenue (aggregate of the sales price)
to your table: (1) Total revenue for the all orders (not just the completed!)
Expected columns: clean_state, total_completed_orders, official_completed_orders,
vintage_completed_orders, total_revenue.
*/
-- q4 solution:
--temp table to assign data from US State to California
with cleaned_state as (select *, case when state in ('US State') then 'California'
else state end as clean_state from customers),
--temp table to get Nike Official + Nike Vintage bizz
total_bizz as (select * from order_items union all select * from order_items_vintage),
--temp table getting total rev from total bizz x state, assigning Null values to Missing Data
revenues_state as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
sum (tb.sale_price) as total_rev
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
group by cs.clean_state),
--temp table doing the required calculations x complete orders x clean state, assigning also Null values to Missing Data
completed_orders as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
count (distinct o.order_id) as total_completed_orders,
count (distinct oi.order_id) as official_completed_orders,
count (distinct oiv.order_id) as vintage_completed_orders
from orders o left join order_items oi on oi.order_id = o.order_id
left join order_items_vintage oiv on o.order_id = oiv.order_id
left join cleaned_state cs on o.user_id = cs.customer_id
where status = 'Complete' group by cs.clean_state )
--main table to show the results
select co.clean_state, total_completed_orders, official_completed_orders, vintage_completed_orders, total_rev
from completed_orders co join revenues_state rs on co.clean_state = rs.clean_state
order by co.clean_state;
/*
Question #5: The leadership team is also interested in understanding the number of order items
that get returned. Reuse the query of question 4 and add an additional metric to the table:
(1) Number of order items that have been returned (items where the return date is populated)
Expected columns: clean_state, total_completed_orders, official_completed_orders, vintage_completed_orders,
total_revenue,returned_items
*/
-- q5 solution:
--temp table to assign data from US State to California
with cleaned_state as (select *, case when state in ('US State') then 'California'
else state end as clean_state from customers),
--temp table to get Nike Official + Nike Vintage bizz
total_bizz as (select * from order_items union all select * from order_items_vintage),
--temp table getting total rev from total bizz x state, assigning Null values to Missing Data
revenues_state as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
sum (tb.sale_price) as total_rev
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
group by cs.clean_state),
--temp table doing the required calculations x complete orders x clean state, assigning also Null values to Missing Data
completed_orders as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
count (distinct o.order_id) as total_completed_orders,
count (distinct oi.order_id) as official_completed_orders,
count (distinct oiv.order_id) as vintage_completed_orders
from orders o left join order_items oi on oi.order_id = o.order_id
left join order_items_vintage oiv on o.order_id = oiv.order_id
left join cleaned_state cs on o.user_id = cs.customer_id
where status = 'Complete' group by cs.clean_state ),
--temp table calculating nr. items returned
returned as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
count (tb.returned_at) as returned_items
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
where tb.returned_at is not null group by cs.clean_state)
--main table showing the results
select r.clean_state, total_completed_orders, official_completed_orders, vintage_completed_orders, total_rev,
returned_items
from completed_orders co join revenues_state rs on co.clean_state = rs.clean_state
join returned r on co.clean_state = r.clean_state
order by r.clean_state;
/*
Question #6: When looking at the number of returned items by itself, it is hard to understand what number of
returned items is acceptable. This is mainly caused by the fact that we don’t have a benchmark at the moment.
Because of that, it is valuable to add an additional metric that looks at the percentage of returned order
items divided by the total order items, we can call this the return rate.
Reuse the query of question 5 and integrate the return rate into your table.
Expected columns: clean_state, total_completed_orders, official_completed_orders, vintage_completed_orders,
total_revenue,returned_items,return_rate
*/
-- q6 solution:
--temp table to assign data from US State to California
with cleaned_state as (select *, case when state in ('US State') then 'California'
else state end as clean_state from customers),
--temp table to get Nike Official + Nike Vintage bizz
total_bizz as (select * from order_items union all select * from order_items_vintage),
--temp table getting total rev from total bizz x state, assigning Null values to Missing Data
revenues_state as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
sum (tb.sale_price) as total_rev
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
group by cs.clean_state),
--temp table doing the required calculations x complete orders x clean state, assigning also Null values to Missing Data
completed_orders as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
count (distinct o.order_id) as total_completed_orders,
count (distinct oi.order_id) as official_completed_orders,
count (distinct oiv.order_id) as vintage_completed_orders
from orders o left join order_items oi on oi.order_id = o.order_id
left join order_items_vintage oiv on o.order_id = oiv.order_id
left join cleaned_state cs on o.user_id = cs.customer_id
where status = 'Complete' group by cs.clean_state ),
--temp table calculating nr. items returned
returned as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
count (tb.returned_at) as returned_items
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
where tb.returned_at is not null group by cs.clean_state),
--temp table calculating total_orders x cleaned_state
r_rate as (select coalesce (cs.clean_state, 'Missing Data') as clean_state,
cast (count (tb.order_id) as float ) as total_orders
from total_bizz tb left join cleaned_state cs on tb.user_id = cs.customer_id
group by cs.clean_state)
--main table showing the results + calculating return_rate
select rr.clean_state, total_completed_orders, official_completed_orders, vintage_completed_orders, total_rev,
returned_items, r.returned_items / rr.total_orders as return_rate
from completed_orders co join revenues_state rs on co.clean_state = rs.clean_state
join returned r on co.clean_state = r.clean_state
join r_rate rr on co.clean_state = rr.clean_state
order by rr.clean_state;