-
Notifications
You must be signed in to change notification settings - Fork 0
/
Reproducible-example.Rmd
1538 lines (1356 loc) · 64.7 KB
/
Reproducible-example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Novel Randomized M&V Examples"
author:
- name: Aoyu Zou
email: [email protected]
affiliation: CBE
- name: Paul Raftery
email: [email protected]
affiliation: CBE
address:
- code: CBE
address: Center for the Built Environment, UC Berkeley, 390 Wurster Hall, Berkeley, CA, 94720, USA
date: "`r Sys.Date()`"
output:
#prettydoc::html_pretty:
bookdown::word_document2:
toc: true
toc_depth: 2
toc_float: true
number_sections: true
theme: paper
highlight: tango
editor_options:
markdown:
wrap: 72
knit: (function(input, ...) {
rmarkdown::render(
input,
output_dir = "./docs"
)
})
---
```{r setup, include = FALSE, cache = FALSE}
# knitr setup
knitr::opts_chunk$set(message = F,
cache = F,
echo = F,
warning = F,
dev = "jpeg",
dpi = 300,
fig.show = "hold",
fig.path = "figs/",
fig.pos = "b")
# str(knitr::opts_chunk$get()) # see for all options
```
# Objective
The purpose of this document is to provide open-source code for analysts
who wish to apply the novel measurement and verification (M&V) procedure
described in the associated paper to their own projects or datasets. By
following the guidance below, users should be able to:
1. Replicate the novel M&V method proposed in the manuscript:
- Design a randomized and balanced switchback control
implementation schedule consisting of a baseline strategy and an
intervention strategy,
- Conduct sequential analysis to infer intervention effect on a
target metric (e.g. energy consumption, carbon emissions, or
thermal comfort).
2. Demonstrate the reliability of the novel randomized method and
compare it with the conventional method under:
- Scenario A: Discrete non-routine events
- Scenario B: Continuous baseline change
By replicating the method proposed in the manuscript, we aim to
demonstrate that the novel M&V approach can reach a conclusion faster
than the conventional method and provide a robust uncertainty
quantification. Additionally, by evaluating savings estimations under
various common scenarios, we showcase the reliability of this new method
in overcoming the limitations of conventional approaches (e.g. the
impact of changes in building performance that are wholely unrelated to
the intervention). This example code follows the flowchart depicted in
the manuscript and provides a minimal example for demonstration
purposes. The analyst should make reasonable adjustments to address
their specific use case.
```{r theme, message=FALSE}
require(pacman)
# load packages using pacman
pacman::p_load(tidyverse, lubridate, here, rmarkdown,
scales, patchwork, magrittr, janitor, qpcR, knitr, # general
ggpmisc, # linear regression
ggpubr,
sjstats, pwr, # anova results
nmecr, # m&v modeling package for TOWT
slider, # moving averages
sprtt, # sequential testing
base, blocksdesign, # blocking
rstatix, #pipe friendly stats
overlapping, # distribution overlapping percentage
effsize, bootES, dabestr) #effect size
# turn off scientific notation
options(scipen = 999, digits = 15)
# set directory
here::i_am("Reproducible-example.rmd")
# set default theme for ggplot
theme_set(theme_minimal())
# define base ggplot theme
theme_update(plot.title = element_text(size = 14, colour = "grey20", face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 10, colour = "grey20", face = "italic", hjust = 0.5, margin = margin(b = 10)),
plot.caption = element_text(size = 10, colour = "grey20", face = "italic", hjust = 0.5),
plot.background = element_rect(fill = "white", colour = NA),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.text = element_text(size = 10),
strip.text = element_text(size = 10, color = "grey20", face = "bold"),
strip.background = element_blank())
# define global color theme (from palatte "Set2")
ls_colours <- c("Baseline" = "#99d8c9",
"Measured baseline" = "#99d8c9",
"Adjusted baseline" = "#99d8c9",
"Projected baseline\n(no change)" = "#1b9e77",
"Intervention" = "#fdbb84",
"Measured interv" = "#fdbb84",
"True savings" = "black",
"True normalized savings" = "black")
# Get case study example data
readfile_path <- "./readfiles/"
# Load defined functions
function_path <- "./functions/"
```
# M&V use case
We apply the same use case described in the manuscript where a
software-as-a-service control company offers a more efficient chiller
control algorithm. Here, the analyst aims to determine the amount of
HVAC energy saved by this control intervention. The overall M&V protocol
regarding sampling ratio, carryover effect, blocking design and stopping
criteria remain the same as the manuscript. To replicate the analysis,
we also fitted a time-of-week and temperature model and normalized the
savings using a typical meterological year weather file for the same
location. We compared the randomized method with the conventional method
using the average treatment effect (ATE) by calculating the mean
difference between baseline samples and intervention samples to estimate
the energy-saved by the control algorithm. For both the TMY-normalized,
and actual-weather savings, we also represent these as fractional
savings (or percent savings) by normalizing using the mean of the
baseline measurements.
# Use case measurement dataset
For this demonstration, we simply create a hypothetical baseline and
intervention measurement set by using a linear function including
outdoor temperature ($temp$), peak and off-peak hour indicator
($\delta_{peak}$, $\delta_{off-peak}$) as independent variables:
$$
Energy = \beta_0 + \beta_1*temp + \beta_2 * \delta_{peak} + \beta_3 * \delta_{off-peak} + \epsilon
$$
We apply that to two years of measured, actual weather data in Chicago,
USA. The $\epsilon$ term accounts for random noise sampled from a normal
distribution $N(0, 10^2)$. The created measurement set for baseline and
intervention controls is shown in figure \@ref(fig:plotdata). If the
analyst follows the conventional M&V method described in ASHRAE
Guideline 14 (G14) or the International Performance Measurement and
Verification Protocol (IPMVP), then only baseline measurements are
measurable/observable during the pre-retrofit period and only
intervention measurements are measurable during the post-retrofit
period. In other words, the control strategy switches between baseline
and intervention one time in the entire study period.
```{r readfiles, message=FALSE, warning=FALSE}
# load holidays
list_holidays <- read_csv(paste0(readfile_path, "us_holidays_2022.csv"),
col_names = c("date", "weekday", "holiday")) %>%
mutate(date = mdy(date))
# load tmy weather file
df_tmy <- read_csv(paste0(readfile_path, "USA_IL_Chicago.Midway.Intl.AP.725340_TMY3.epw"),
skip = 8, col_types = "ddddd-d---------------------------------",
col_names = c("year", "month", "day", "hour", "min", "tmy")) %>%
mutate(year = 2022,
time = ymd_h(paste(paste(year, month, day, sep = "-"), hour, sep = " ")),
temp = tmy) %>%
dplyr::select(time, temp)
# load temperature from local weather station
df_weather <- read_csv(paste0(readfile_path, "weather.csv")) %>%
mutate(datetime = with_tz(datetime_UTC, tz = "America/Chicago")) %>%
select(c(datetime, t_out)) %>%
mutate(across(t_out, ~ zoo::na.approx(., na.rm = FALSE))) %>%
filter(datetime >= as.Date("2021-01-01")) %>%
filter(datetime < as.Date("2023-01-01")) %>%
unique() %>%
arrange(datetime)
# Create a sequence of hourly timestamps for one year
timestamps <- seq(from = as.POSIXct("2021-01-01 00:00:00"),
to = as.POSIXct("2022-12-31 23:00:00"),
by = "hour")
# Define peak hours (e.g., 12 PM to 6 PM)
peak_hours <- as.integer(format(timestamps, "%H") %in% c("12", "13", "14", "15", "16", "17"))
# Calculate non-peak hours
non_peak_hours <- 1 - peak_hours
# Baseline model parameters
intercept <- 45
beta_peak <- 20
beta_temp <- 0.9
beta_non_peak <- 5
# Compute the energy consumption
energy_consumption <- intercept +
beta_temp * df_weather$t_out +
beta_peak * peak_hours +
beta_non_peak * non_peak_hours +
rnorm(length(timestamps), mean = 0, sd = sqrt(100))
# Create a data frame
df_base <- data.frame(datetime = timestamps,
t_out = df_weather$t_out ,
power = energy_consumption)
# Output the first few rows of the dataset
kable(head(df_base))
# Intervention model parameters
intercept <- 38
beta_temp <- 0.4
beta_peak <- 12
beta_non_peak <- 10
# Compute the energy consumption
energy_consumption <- intercept +
beta_temp * df_weather$t_out +
beta_peak * peak_hours +
beta_non_peak * non_peak_hours +
rnorm(length(timestamps), mean = 0, sd = sqrt(100))
# Create a data frame
df_interv <- data.frame(datetime = timestamps,
t_out = df_weather$t_out,
power = energy_consumption)
```
```{r plotdata, fig.cap="Generated hypothetical baseline and intervention measurements across the whole study period using a linear function (data points show hourly energy consumption and line fitted using the loess function)."}
df_base %>%
rename(Baseline = power,
base_t_out = t_out) %>%
left_join(df_interv %>% rename(Intervention = power), by = "datetime") %>%
pivot_longer(-c(datetime, base_t_out, t_out), names_to = "parameter", values_to = "value") %>%
ggplot(aes(x = t_out, y = value, color = parameter)) +
geom_point(size = 0.1, alpha = 0.1) +
geom_smooth() +
scale_x_continuous(expand = c(0, 0),
limits = c(-15, 35),
breaks = breaks_pretty(n = 5),
labels = number_format(suffix = " °C")) +
scale_y_continuous(expand = c(0, 0),
labels = number_format(suffix = " kW")) +
scale_color_manual(values = ls_colours) +
labs(x = NULL,
y = NULL,
color = NULL,
title = "Created baseline and intervention measurements") +
theme(panel.grid.major.y = element_line(colour = "grey80", size = 0.25),
legend.direction = "horizontal",
legend.position = "bottom",
plot.margin = margin(t = 2, r = 7, b = 2, l = 2, unit = "mm"))
```
# Novel randomized M&V analysis
## Switchback experimental design
As most control intervention strategies can be implemented
interchangeably with the building baseline control, we argue in the
paper that randomizing the implementation sequence between intervention
and baseline control strategies can balance the impact of confounding
variables when estimating the intervention effect, find a result faster,
and be more robust to the occurrence of changes in performance that are
unrelated to the intervention. A simple way to get started is using a
built-in function '*sample()'* in R package *'base'* to randomly select
at each sampling interval which strategy to implement. The advantage of
this function is easy to implement for a short test run (e.g. 1 week) on
a single unit. But it is not recommended for more complicated tests, and
we will address these limitations below.
```{r simple_schedule, include=TRUE, message=FALSE}
# Define strategies and number of days
strategies <- c("Base", "Intervention")
num_weeks <- 4
days_per_week <- 7
total_days <- num_weeks * days_per_week
# Randomly assign strategies to each day
assigned_strategies <- sample(strategies, total_days, replace = TRUE)
# Convert the linear structure into a weekly structure for easier interpretation
weekly_assignment <- matrix(assigned_strategies, nrow = num_weeks, ncol = days_per_week)
# print(weekly_assignment[1, ])
```
In statistics, average treatment effect (ATE) estimation depends on the
statistical power of unbiased randomization and balanced sample size.
This means to accurately estimate the energy-saving effect of the
intervention control strategy, we should design an unbiased
randomization that has an equal amount of measurements for each level of
the confounding variables. Therefore, a simple randomization can still
be biased, for instance, by yielding a poor randomization where the
baseline - by chance - is sampled more often on weekends and vice versa.
This can be avoided by designing a block randomization, which is a type
of experimental design that first divides the testing units into blocks
and then randomly assigns which strategy to implement for each unit
within each block. Thus in practice, each block type corresponds to one
confounding variable, and since strategies are sampled at the block
level, this means each confounding variable is covered by all testing
strategies. For example, building energy consumption is known to be
affected by occupancy, time, and outdoor weather conditions. If the
control service company wishes to bill the customer based on the savings
solely from the control retrofit itself, the M&V analyst should
decompose the measured energy savings and adjust for the influence from
differences in weather.
For example in this document, we assume two strategies are being
compared over a blocking period of 12 weeks. To do block design, we use
a function in R called *'blocks()'* from the *'blocksdesign'* package.
We further bonded other functions and inputs to make the package more
versatile (see code in .rmd file). Once successfully called, the
function will return a list consisting of schedule output and schedule
summaries. The detailed schedule can be viewed by calling the `schedule`
key name.
```{r block, include=TRUE, message=FALSE}
source(paste0(function_path, "blocking.R")) # block design schedule generation function
schedule <- blocking(start_date = "2021-01-01",
n_weeks = 48,
n_seasons = 4,
seed = 390,
searches = 20,
jumps = 20,
treatments = 2,
consec = 1)
kable(head(schedule$schedule, n = 14))
```
To check whether the blocking experimental design is balanced, we can
call three built-in summary table outputs:
1. Weekday summary: calculates how many days are sampled for each
season weekday and strategy combination.
2. Consecutive days summary: calculates how many consecutive days are
sampled for each strategy.
```{r summaryweek}
kable(head(schedule$weekday_summary))
```
```{r summaryconsec}
kable(head(schedule$consec_summary))
```
## Sequential analysis of the intervention effect
This section replicates the sequential analysis described in the
manuscript using the newly created measurement dataset. The results are
shown in figure \@ref(fig:sprtresults).
```{r sprtdata}
# source(paste0(function_path, "saving_pred.R")) # annual saving estimation function
source(paste0(function_path, "ol_est.R")) # independent variable overlapping calculation
source(paste0(function_path, "saving_pred.R")) # annual saving estimation function
# Combine random sampling
df_schedule <- data.frame(strategy = schedule$schedule$strategy,
datetime = schedule$schedule$date)
df_base_sch <- df_base %>%
left_join(df_schedule, by = "datetime") %>%
fill(strategy, .direction = "down") %>%
filter(strategy == 1)
df_interv_sch <- df_interv %>%
left_join(df_schedule, by = "datetime") %>%
fill(strategy, .direction = "down") %>%
filter(strategy == 2)
df_hourly <- bind_rows(df_base_sch, df_interv_sch) %>%
arrange(datetime)
# make daily totals
df_daily <- df_hourly %>%
group_by(datetime = floor_date(datetime, unit = "day")) %>%
summarise(strategy = unique(strategy),
power_ave = mean(power, na.rm = TRUE),
power_peak = max(power, na.rm = TRUE),
t_out = mean(t_out, na.rm = TRUE)) %>%
ungroup()
```
```{r, sprtrun, cache=F}
# prepare sequential test dataset
sprt_hourly <- df_hourly %>%
mutate(week = interval(min(datetime), datetime) %>% as.numeric('weeks') %>% floor())
# set analysis parameters
param_sprt <- list(baseline = "Baseline",
strategy = "Intervention",
parameter = "power_ave", # choice: power_ave; mmoer; co2
label = "power", # choice: power; emissions
n_weeks = 48)
# prepare dataframe for analysis
df_sprt <- df_daily %>%
mutate(strategy = as.factor(strategy),
strategy = recode_factor(strategy, "1" = "Baseline", "2" = "Intervention")) %>%
filter(strategy %in% c(param_sprt$baseline, param_sprt$strategy)) %>%
pivot_longer(cols = -c(datetime, strategy),
names_to = "parameter",
values_to = "value") %>%
mutate(week = interval(min(datetime), datetime) %>% as.numeric('weeks') %>% floor(),
value = value) %>%
filter(str_detect(parameter, param_sprt$parameter)) %>%
droplevels()
# define list to store weekly means
df_means <- list()
# calculate weekly means
for (i in 1:param_sprt$n_weeks) {
# subset data by week
df_means[[i]] <- df_sprt %>%
filter(week <= i) %>%
group_by(strategy,
parameter) %>%
summarise(week = i,
value_ave = mean(value, na.rm = TRUE),
value_sd = sd(value, na.rm = TRUE),
.groups = "keep") %>%
ungroup()
}
# combine list into df
df_means <- bind_rows(df_means)
# define lists to store stopping criteria results
## SPRT results
sprt_res <- list()
## 80% independent variable
overlap_base <- list()
overlap_s2 <- list()
quantile_tmy <- quantile(df_tmy$temp, na.rm = TRUE, c(0, 0.99))
## annual saving estimation
annual_saving <- list()
# do sample
for (i in 2:param_sprt$n_weeks) {
# subset data by week
df_seq <- df_sprt %>%
filter(week <= i) %>%
droplevels()
# Calculate overlapping temperature range
overlap_base[[i]] <- tibble("n_weeks" = i,
overlap_base = sprt_hourly %>%
filter(week <= i & strategy == 1) %>%
ol_est(., quantile_tmy))
overlap_s2[[i]] <- tibble("n_weeks" = i,
overlap_s2 = sprt_hourly %>%
filter(week <= i & strategy == 2) %>%
ol_est(., quantile_tmy))
# start estimation after 2 months for complete ftow profile
if (i/4 > 2) {
# Update user on the week of update
# print(paste0("updating TOWT model in week ", i))
annual_saving[[i]] <- tibble("n_weeks" = i,
annual_saving = saving_pred(sprt_hourly %>% filter(week <= i)))
}
# do test
results_seq <- seq_ttest(x = value ~ strategy,
data = df_seq,
d = 0.5,
power = 0.9,
alternative = "less", # greater
paired = FALSE,
verbose = TRUE)
# calculate effect size
results_ci <- effsize::cohen.d(d = df_seq$value,
f = df_seq$strategy,
conf.level = 0.90,
paired = FALSE,
na.rm = TRUE)
# update user
# print(paste0("Calculating effect size for week ", i))
# bootstrap effect size
results_bs <- bootES::bootES(data = df_seq, R = 1000,
contrast = c(param_sprt$baseline, param_sprt$strategy),
data.col = "value", group.col = "strategy")
# extract test statistic
sprt_res[[i]] <- tibble("n_weeks" = i,
"threshold_lower" = exp(results_seq@B_boundary_log),
"threshold_upper" = exp(results_seq@A_boundary_log),
"statistic" = results_seq@likelihood_ratio,
"decision" = results_seq@decision,
"cohens_d" = round(results_ci$estimate, digits = 2),
"ci_low" = round(results_ci$conf.int[[1]], digits = 2),
"ci_high" = round(results_ci$conf.int[[2]], digits = 2),
"ns_stat" = round(results_bs$t0, digits = 2),
"ns_ci_low" = round(results_bs$bounds[[1]], digits = 2),
"ns_ci_high" = round(results_bs$bounds[[2]], digits = 2))
}
# combine list and vectors into df
annual_saving <- bind_rows(annual_saving)
# work out when threshold is reached
sprt_res <- bind_rows(sprt_res)
sprt_res <- sprt_res %>%
left_join(., annual_saving, by = "n_weeks") %>%
mutate(flag = ifelse(str_detect(decision, "accept"), 1, 0))
sprt_overlap_base <- bind_rows(overlap_base) %>%
left_join(., annual_saving, by = "n_weeks") %>%
mutate(flag = ifelse(overlap_base >= 0.8, 1, 0),
flag = ifelse(flag == lag(flag, 1), 0, flag))
sprt_overlap_s2 <- bind_rows(overlap_s2) %>%
left_join(., annual_saving, by = "n_weeks") %>%
mutate(flag = ifelse(overlap_s2 >= 0.8, 1, 0),
flag = ifelse(flag == lag(flag, 1), 0, flag))
```
```{r truesaving}
df_week <- df_base %>%
select(datetime,
base_power = power) %>%
left_join(df_interv, by = "datetime") %>%
mutate(savings = power - base_power) %>%
mutate(week = interval(min(datetime), datetime) %>% as.numeric('weeks') %>% floor()) %>%
filter(week <= 48)
true_saving <- list()
for (i in 2:48){
saving <- df_week %>%
filter(week <= i)
true_saving[[i]] <- tibble("n_weeks" = i,
savings = mean(saving %>% .$saving))
}
true_saving <- bind_rows(true_saving)
# Create a sequence of hourly timestamps for one year
timestamps <- seq(from = as.POSIXct("2022-01-01 01:00:00"),
to = as.POSIXct("2023-01-01 00:00:00"),
by = "hour")
# Define peak hours (e.g., 12 PM to 6 PM)
peak_hours <- as.integer(format(timestamps, "%H") %in% c("12", "13", "14", "15", "16", "17"))
# Calculate non-peak hours
non_peak_hours <- 1 - peak_hours
# Baseline model parameters
intercept <- 45
beta_peak <- 20
beta_temp <- 0.9
beta_non_peak <- 5
# Compute the energy consumption
energy_consumption <- intercept +
beta_temp * df_tmy$temp +
beta_peak * peak_hours +
beta_non_peak * non_peak_hours
# Create a data frame
df_base_tmy <- data.frame(datetime = timestamps,
t_out = df_tmy$temp,
power = energy_consumption)
# Intervention model parameters
intercept <- 38
beta_temp <- 0.4
beta_peak <- 12
beta_non_peak <- 10
# Compute the energy consumption
energy_consumption <- intercept +
beta_temp * df_tmy$temp +
beta_peak * peak_hours +
beta_non_peak * non_peak_hours
# Create a data frame
df_interv_tmy <- data.frame(datetime = timestamps,
t_out = df_tmy$temp,
power = energy_consumption)
df_hourly_tmy <- df_base_tmy %>%
select(datetime,
base_power = power) %>%
left_join(df_interv_tmy, by = "datetime")
tmy_saving <- (mean(df_hourly_tmy %>% .$base_power) -
mean(df_hourly_tmy %>% .$power)) /
mean(df_hourly_tmy %>% .$base_power)
```
```{r sprtresults, fig.height=10, fig.width=7.5, fig.cap="Comprehensive results summary of the proposed M&V method applied to the case study throughout a total of 48 weeks"}
p2 <- ggplot(sprt_res, aes(x = n_weeks, y = ns_stat)) +
geom_ribbon(aes(ymin = ns_ci_low, ymax = ns_ci_high), alpha = 0.5, fill = "#D0E3F1") +
geom_line(size = 1.25, colour = "grey20", alpha = 0.4) +
geom_line(data = true_saving, aes(x = n_weeks, y = savings, color = "True savings"),
linetype = "dashed") +
geom_text(data = sprt_res[2, ],
aes(x = n_weeks, y = ns_ci_high, label = "Upper 90% CI"),
position = position_nudge(x = 1),
colour = "grey20",
size = 2.5,
fontface = "italic") +
geom_text(data = sprt_res[2, ],
aes(x = n_weeks, y = ns_ci_low, label = "Lower 90% CI"),
position = position_nudge(x = 1),
colour = "grey20",
size = 2.5,
fontface = "italic") +
geom_point(data = sprt_res %>% filter(n_weeks%%12 == 0),
size = 8,
shape = 16,
colour = "#347EB3",
alpha = 0.5) +
geom_point(data = sprt_res %>% filter(n_weeks%%12 == 0),
size = 4,
shape = 16,
colour = "#347EB3",
alpha = 0.9) +
geom_text(data = sprt_res %>% filter(n_weeks%%12 == 0),
aes(x = n_weeks, y = ns_stat, label = paste0(round(ns_stat, 1L), " kW")),
position = position_nudge(x = 2.5),
colour = "grey20",
size = 3.0,
fontface = "italic") +
geom_errorbar(data = sprt_res %>% filter(n_weeks%%12 == 0),
aes(x = n_weeks, ymin = ns_ci_low, ymax = ns_ci_high),
width = 1,
colour = "#347EB3",
alpha = 0.5,
size = 0.8) +
geom_text(data = sprt_res %>% filter(n_weeks%%12 == 0),
aes(x = n_weeks, y = ns_ci_high, label = paste0(round(ns_ci_high, 1L), " kW")),
position = position_nudge(x = 2.5),
colour = "grey20",
size = 2.5,
fontface = "italic") +
geom_text(data = sprt_res %>% filter(n_weeks%%12 == 0),
aes(x = n_weeks, y = ns_ci_low, label = paste0(round(ns_ci_low, 1L), " kW")),
position = position_nudge(x = 2.5),
colour = "grey20",
size = 2.5,
fontface = "italic") +
geom_text(data = sprt_res %>% filter(n_weeks%%12 == 0 & flag == 1),
aes(x = n_weeks, y = -16, label = "Difference\nfound"),
colour = "grey20",
size = 3.0,
fontface = "italic") +
geom_text(data = slice_tail(sprt_res, n = 1),
aes(x = n_weeks, y = ns_stat, label = paste0(round(ns_stat, 1L), " kW")),
position = position_nudge(x = 0.5),
colour = "grey20",
size = 3.0,
check_overlap = TRUE,
hjust = 0) +
geom_text(data = slice_tail(sprt_res, n = 1),
aes(x = ifelse(is.na(first(flag)), 0, n_weeks),
y = ns_ci_high,
label = ifelse(is.na(first(flag)), NULL, paste0(round(ns_ci_high, 1L), " kW"))),
position = position_nudge(x = 0.5),
colour = "grey20",
size = 2.5,
check_overlap = TRUE,
hjust = 0) +
geom_text(data = slice_tail(sprt_res, n = 1),
aes(x = ifelse(is.na(first(flag)), 0, n_weeks),
y = ns_ci_low,
label = ifelse(is.na(first(flag)), NULL, paste0(round(ns_ci_low, 1L), " kW"))),
position = position_nudge(x = 0.5),
colour = "grey20",
size = 2.5,
check_overlap = TRUE,
hjust = 0) +
geom_vline(xintercept = seq(0, 50, by = 12),
linetype = "dashed",
color = "grey20",
alpha = 0.3,
size = 0.5) +
annotate(geom = "text",
x = seq(5, 50, by = 12),
y = 0,
label = paste0("12-week block"),
alpha = 0.5,
size = 3) +
scale_x_continuous(expand = c(0, 0),
limits = c(1, param_sprt$n_weeks + 0.5),
labels = number_format(accuracy = 1L, suffix = "\nweeks")) +
scale_y_continuous(expand = c(0, 0),
limits = c(-16, 0),
breaks = pretty_breaks(n = 3),
labels = number_format(suffix = " kW")) +
labs(title = NULL,
subtitle = "SPRT results and estimated difference in power consumption without weather normalization",
x = NULL,
y = NULL,
color = NULL) +
scale_color_manual(values = ls_colours) +
guides(alpha = "none") +
coord_cartesian(clip = "off") +
theme(panel.grid.major.y = element_line(colour = "grey80", size = 0.25),
legend.position = "bottom",
axis.text.x = element_blank(),
plot.margin = margin(3, 15, 3, 3, unit = "mm"))
p4 <- ggplot(sprt_overlap_base) +
geom_line(aes(x = n_weeks, y = overlap_base, color = "Baseline"),
size = 1.25) +
geom_line(data = sprt_overlap_s2,
aes(x = n_weeks, y = overlap_s2, color = "Intervention"),
size = 1.25) +
geom_text(data = sprt_overlap_base[nrow(sprt_overlap_base), ],
aes(x = n_weeks, y = 0.8, label = "80% of\nTMY range\nthreshold"),
position = position_nudge(x = 0.5),
color = "red",
size = 3.0,
check_overlap = TRUE,
hjust = 0) +
geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
scale_color_manual(values = ls_colours) +
scale_x_continuous(expand = c(0, 0),
limits = c(1, param_sprt$n_weeks + 0.5),
labels = number_format(accuracy = 1L, suffix = "\nweeks")) +
scale_y_continuous(breaks = seq(0.2, 1.0, by = 0.2),
labels = c("20%", "40%", "60%", "80%", "100%")) +
labs(title = NULL,
subtitle = "Confounding variable (outdoor drybulb temperature) range coverage",
x = NULL,
y = NULL) +
coord_cartesian(clip = "off") +
theme(panel.grid.major.y = element_line(),
axis.text.x = element_blank(),
legend.position = "none",
plot.margin = margin(3, 15, 3, 3, unit = "mm"))
# do running mean plot
p1 <- ggplot(df_means, aes(x = week, y = value_ave)) +
geom_line(aes(colour = strategy), size = 1.25) +
geom_text(data = filter(df_means,
week == param_sprt$n_weeks,
parameter == param_sprt$parameter),
aes(x = param_sprt$n_weeks,
y = value_ave,
colour = strategy,
label = strategy),
position = position_nudge(x = 0.5),
size = 3.0,
check_overlap = TRUE,
hjust = 0) +
scale_x_continuous(expand = c(0, 0),
limits = c(1, param_sprt$n_weeks + 0.5)) +
scale_y_continuous(expand = c(0, 0),
breaks = breaks_pretty(n = 4),
labels = number_format(suffix = " kW")) +
scale_color_manual(values = ls_colours) +
labs(title = NULL,
subtitle = "Running average power consumption of the case study building",
x = NULL,
y = NULL) +
guides(alpha = "none", colour = "none") +
coord_cartesian(clip = "off") +
theme(panel.grid.major.y = element_line(colour = "grey80", size = 0.25),
axis.text.x = element_blank(),
plot.margin = margin(3, 15, 3, 3, unit = "mm"))
p3 <- ggplot(annual_saving) +
geom_line(aes(x = n_weeks, y = annual_saving),
size = 1.25, color="darkgrey") +
geom_hline(aes(yintercept = tmy_saving * 100, color = "True normalized savings"),
linetype = "dashed") +
geom_point(data = annual_saving[1, ],
aes(x = n_weeks, y = annual_saving),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_text(data = annual_saving[1, ],
aes(x = n_weeks - 3,
y = annual_saving,
label = paste0("Estimation start\nweek ", n_weeks)),
colour = "grey20", size = 2.5, fontface = "italic") +
geom_point(data = sprt_res %>% filter(flag == 1),
aes(x = ifelse(is.null(first(flag)), max(sprt_res$n_weeks), n_weeks),
y = ifelse(is.null(first(flag)), max(sprt_res$annual_saving), annual_saving)),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_point(data = sprt_overlap_base %>% filter(flag == 1),
aes(x = ifelse(is.null(first(flag)), max(sprt_overlap_base$n_weeks), n_weeks),
y = ifelse(is.null(first(flag)), max(sprt_overlap_base$annual_saving), annual_saving)),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_text(data = sprt_overlap_base %>% filter(flag == 1),
aes(x = ifelse(is.null(first(flag)), max(sprt_overlap_base$n_weeks), n_weeks),
y = ifelse(is.null(first(flag)), max(sprt_overlap_base$annual_saving), annual_saving) - 3,
label = paste0("Temperature check\n(Baseline)\nweek ", n_weeks)),
colour = "grey20",
size = 2.5,
fontface = "italic",
check_overlap = TRUE) +
geom_point(data = sprt_overlap_s2 %>% filter(flag == 1),
aes(x = ifelse(is.null(first(flag)), max(sprt_overlap_s2$n_weeks), n_weeks),
y = ifelse(is.null(first(flag)), max(sprt_overlap_s2$annual_saving), annual_saving)),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_text(data = sprt_overlap_s2 %>% filter(flag == 1),
aes(x = ifelse(is.null(first(flag)), max(sprt_overlap_s2$n_weeks), n_weeks),
y = ifelse(is.null(first(flag)), max(sprt_overlap_s2$annual_saving), annual_saving) + 3,
label = paste0("Temperature check\n(Intervention)\nweek ", n_weeks)),
colour = "grey20", size = 2.5, fontface = "italic") +
geom_point(data = annual_saving %>% filter(n_weeks == 12),
aes(x = n_weeks, y = annual_saving),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_text(data = annual_saving %>% filter(n_weeks == 12),
aes(x = n_weeks, y = annual_saving + 4,label = paste0("Minimum\ntest length check\nweek ", n_weeks)),
colour = "grey20",
size = 2.5,
fontface = "italic")+
geom_point(data = annual_saving %>% filter(n_weeks == 24),
aes(x = n_weeks, y = annual_saving),
size = 4,
alpha = 0.8,
shape = 16,
colour = "#347EB3") +
geom_text(data = annual_saving %>% filter(n_weeks == 24),
aes(x = n_weeks, y = annual_saving - 3, label = paste0("End of block period\nall stopping criteria met\nweek ", n_weeks)),
colour = "grey20",
size = 2.5,
fontface = "italic")+
geom_text(data = sprt_res %>% filter(n_weeks%%12 == 0 & flag == 1) %>% slice(1),
aes(x = n_weeks, y = annual_saving - 4, label = paste0("SPRT check\nweek ", n_weeks)),
colour = "grey20",
size = 2.5,
fontface = "italic") +
geom_vline(xintercept = seq(0, 50, by = 12),
linetype = "dashed",
color = "grey20",
alpha = 0.3,
size = 0.5) +
annotate(geom = "text",
x = seq(5, 50, by = 12),
y = 24,
label = paste0("12-week block"),
alpha = 0.5,
size = 3) +
geom_text(data = annual_saving[nrow(annual_saving), ],
aes(x = n_weeks, y = annual_saving, label = paste0(round(annual_saving, 0), "%")),
position = position_nudge(x = 0.5),
colour = "grey20",
size = 3.0,
check_overlap = TRUE,
hjust = 0) +
scale_x_continuous(expand = c(0, 0),
breaks = seq(0, 48, by = 12),
limits = c(1, param_sprt$n_weeks + 0.5),
labels = number_format(accuracy = 1L, suffix = "\nweeks")) +
scale_y_continuous(breaks = seq(0, 25, by = 5),
limits = c(10, 24),
labels = c("0%", "5%", "10%", "15%", "20%", "25%")) +
scale_color_manual(values = ls_colours) +
coord_cartesian(clip = "off") +
labs(title = NULL,
subtitle = "Normalized annual energy savings estimate using TOWT model predictions for a typical meteorological year",
x = NULL,
y = NULL,
color = NULL) +
theme(panel.grid.major.y = element_line(),
legend.position = "bottom",
plot.margin = margin(3, 15, 3, 3, unit = "mm"))
# Overall result
p1 / p2 / p4 / p3 +
plot_annotation(title = "Overall sequential evaluation results using pre-defined stopping criteria") +
plot_annotation(tag_levels = c('a'), tag_suffix = ')') &
theme(plot.tag.position = c(0, 1),
plot.tag = element_text(color="black"))
```
As figure \@ref(fig:sprtresults) indicates, the running average of power
consumption in subplot a) shows more savings from the intervention
strategy during warmer outdoor weather in general. This also corresponds
to the sequential probability ratio test (SPRT) results shown in subplot
b). The energy savings as the difference between baseline samples and
intervention samples were estimated over actual weather encountered
during the study. Subplot b) also shows a minimal detectable saving was
observed early in week 12. It also highlights how consistently close the
estimated interventon effect is to the true effect (without the effect
of noise) throughout the study period. However, subplot c) indicates
that outdoor weather has not reached the 80% threshold range calculated
with reference to the typical meteorological year (TMY) weather file, we
need to continue the M&V process in order to be able to normalize to a
full range of weather conditions. At week 16, the accumulated
temperature measured in sampled baseline days covered 80% range and at
week 24, sampled intervention days satisfied the criterion. Therefore,
we report a 10 kW savings at the end of this blocking period (i.e. week
24). If the analyst decides to continue sampling, the plot further shows
the uncertainty associated with saving estimation further decreased
throughout the remaining 24 weeks. Subplot d) plots the sequentially
estimated normalized savings on the TMY weather conditions and outlines
the check-point for satisfying each critical stopping criterion. It
shows although the saving was detected early by the statistical test at
week 12, the normalized annual estimation still largely fluctuates at
the point. After covered sufficient weather conditions, the annualized
savings stabilize at 18% which corresponds to the true TMY normalized
savings as shown in subplot d).
Considering now the scenario where the analyst had applied conventional
M&V following ASHRAE Guideline 14 or IPMVP, it would take 2 years to
reach the same result (12 month baseline, 12 month intervention). The
point at which the new method was able to accurately estimate TMY
normalized savings was week 24. Using conventional M&V methods, this
would merely be the mid-point of baseline collection assuming no
interruptions or delays and they would need to wait for another one year
and half to estimate weather normalized savings. Even if the analyst
performed conventional M&V using a 6 month baseline and 6 month
intervention period (1 year total), the result would be less accurate
and still take longer than the method proposed above. Attempting to
perform conventional M&V using less than that (e.g., 24 weeks) would
yield nonsensical data as the baseline and intervention periods would
span very different weather data ranges. Yet this randomized M&V method
was able to achieve a valid, robust result in this time.
## Scenario A: A discrete non-routine event
One common and typical non-routine event in an energy-saving M&V project
is a change in occupancy, or a substantial change in occupant behavior.
This is because occupancy has a substantial influence on building energy
consumption but it is typically not closely related to intervention. In
reality, it is very rare to have occupancy measurements (compared to
other independent variables such as outdoor temperature and therefore)
when a analyst fits a regression model (e.g. TOWT), thus, the analyst
normally must assume that occupancy remains unchanged throughout the
study. ASHRAE G14 and IPMVP only require the analyst to check for the
coefficient of the variation of the root mean square error (CVRMSE) and
normalized mean square error (NMSE) of a fitted baseline model. Thus, if
occupancy changes after the retrofit implementation, the analyst can
still obtain an 'accurate' baseline model with qualified CV(RMSE) but it
is biased due to the unrelated change event being overlooked. The novel
M&V method, on the other hand, resolves this by sampling randomly and
frequently throughout the full distribution of occupancy over the full
study period, and thus the results are robust to the change in
occupancy.
Consider a scenario (Scenario A), where a tenant in a high rise office
building moves out during the M&V study period, vacating one floor of
the building. We expect the impact of non-routine events such as these
to be balanced in sampled baseline days and intervention days if we
sample randomly with equal probability. This means if occupancy is
notably different over say, a 4 week period, the randomized M&V will