Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Models with gfactor Speed #502

Merged
merged 3 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,15 @@ models:
tests:
- five_minute_daily_count:
group_by_columns: ["detector_id", "sample_date"]
columns:
- name: speed_weighted
description: |
If the detector reports a measured speed (miles/hour) in the lane then that value will be used.
The reported value is weighted by the number of vehicles in each sample period. If no speed is
reported by the device than the speed value calculated from the
int_clearinghouse__detector_g_factor_based_speed model will be placed in the corresponding
detector and timestamp row. If there is no device or g-factor provided speed the value will remain
null and be populated using imputation in downstream models.
- name: int_clearinghouse__detector_g_factor_based_speed
description: |
This model calculates the g-factor based smoothing speed. According to the PeMS documentation, the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@ spine as (
on to_date(ts.timestamp_column) = dd.sample_date
),

-- Add the model where gfactor speed has been calculated
gfactor_speed as (
select
detector_id,
sample_timestamp,
imputed_speed
from {{ ref('int_clearinghouse__detector_g_factor_based_speed') }}
),

/* Join 5-minute aggregated data to the spine to get a table without missing rows */
base as (
select
Expand All @@ -59,7 +68,7 @@ base as (
agg.zero_occ_pos_vol_ct,
agg.high_volume_ct,
agg.high_occupancy_ct,
agg.speed_weighted,
coalesce(agg.speed_weighted, gs.imputed_speed) as speed_weighted,
agg.volume_observed,
coalesce(agg.state_postmile, dmeta.state_postmile) as state_postmile,
coalesce(agg.absolute_postmile, dmeta.absolute_postmile) as absolute_postmile,
Expand Down Expand Up @@ -88,6 +97,10 @@ base as (
to_date(spine.timestamp_column) < dmeta._valid_to
or dmeta._valid_to is null
)
left join gfactor_speed as gs
on
agg.detector_id = gs.detector_id
and agg.sample_timestamp = gs.sample_timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: adding another join to this already expensive model doesn't seems to hurt performance too much (see query profile from CI here), but let's keep our eyes on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternatives we could investigate are creating a new model downstream of int_clearinghouse__detector_agg_five_minutes_with_missing_rows to add the g-factor speed or possibly add it to the int_imputation__detector_agg_five_minutes model.

)

select * from base
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ detector_agg as (
occupancy_avg,
speed_weighted,
volume_observed
from {{ ref('int_clearinghouse__detector_agg_five_minutes_with_missing_rows') }}
from {{ ref('int_clearinghouse__detector_agg_five_minutes') }}
where {{ make_model_incremental('sample_date') }}
),

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,7 @@ unimputed as (
-- If the detector_id in the join is not null, it means that the detector
-- is considered to be "good" for a given date.
(good_detectors.detector_id is not null) as detector_is_good,
coalesce(base.speed_weighted, (base.volume_sum * 22) / nullifzero(base.occupancy_avg) * (1 / 5280) * 12)
as speed_five_mins
base.speed_weighted as speed_five_mins
from base
left join good_detectors
on
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ five_minute_agg as (
sample_ct,
volume_sum,
occupancy_avg,
speed_five_mins as speed_weighted,
speed_five_mins,
station_type,
absolute_postmile,
volume_imputation_method,
Expand All @@ -35,27 +35,6 @@ five_minute_agg as (
where {{ make_model_incremental('sample_date') }}
),

aggregated_speed as (
select
*,
--A preliminary speed calcuation was developed on 3/22/24
--using a vehicle effective length of 22 feet
--(16 ft vehicle + 6 ft detector zone) feet and using
--a conversion to get miles per hour (5280 ft / mile and 12
--5-minute intervals in an hour).
--The following code may be used if we want to use speed from raw data
--coalesce(speed_raw, ((volume * 22) / nullifzero(occupancy)
--* (1 / 5280) * 12))
--impute five minutes missing speed
coalesce(speed_weighted, (volume_sum * 22) / nullifzero(occupancy_avg) * (1 / 5280) * 12)
as speed_five_mins,
-- create a boolean function to track wheather speed is imputed or not
coalesce(speed_five_mins != speed_weighted or (speed_five_mins is not null and speed_weighted is null), false)
-- coalesce(speed_weighted is null, false)
as is_speed_calculated
from five_minute_agg
),

vmt_vht_metrics as (
select
*,
Expand All @@ -67,7 +46,7 @@ vmt_vht_metrics as (
vmt / nullifzero(vht) as q_value,
-- travel time
60 / nullifzero(q_value) as tti
from aggregated_speed
from five_minute_agg
),

delay_metrics as (
Expand Down
Loading