refactor: improve mapping of generators to buses #267

danielolsen · 2022-02-10T00:27:24Z

Purpose

In preliminary testing, the hifld grid model had shown a lot of infeasibilities caused by mismatches between inflexible generators (namely coal, nuclear, and hydro) and local transmission capacities. This PR aims to improve this in two primary ways:

Aggregating multiple hydro generating units within one plant into a single generator with equivalent Pmax and Pmin sums
Improving how generators are mapped to substations
Improving how generators are mapped to buses within substations

Hydro aggregation is relevant to the improving how generators are mapped to buses within substations step, since we decide which bus within a substation to map to based on generator capacity.

The most complicated part by far is the how generators are mapped to substations. The old logic was:

Are coordinates available for the generator?
- Yes: Are there any substations with ZIP codes that match the generator?
  - Yes: return the closest substation within that ZIP
  - No: return the closest substation within the 'closest' 200 ZIP codes
- No: Are there any substations with ZIP codes that match the generator?
  - Yes: return an arbitrarily chosen one
  - No: return nothing

The new logic is:

Are coordinates available for the generator?
- Yes: Find the closest substation within that generator's state & interconnection (with some edge-case handling)
- No: Are there any substations with ZIP codes that match the generator?
  - Yes: return an arbitrarily chosen one
  - No: return nothing

The no-coordinates case is a fraction of a percent; the big changes are from the cases where we do have coordinates and now use those as the primary substation mapping tool, rather than ZIP codes.

What the code is doing

Aggregating hydro units: a new aggregate_hydro_generators_by_plant_id function is added, which groups hydro units by plant ID, sums Pmin and Pmax, and returns all other attributes and index from the first unit in the group.

Improving the mapping of generators to substations:

We remove the old map_generator_to_sub_by_location and add a replacement map_generators_to_sub_by_location. These function names differ because the previous one operated on one generator at a time as a part of an apply call, while the new one operates on the entire generators dataframe.
The new function translates lats & lons of the substations and generators dataframes into (x, y, z) pairs in 3d space, where the center of the earth is (0, 0, 0) and the radius is 1. We use a new latlon_to_xyz function that we add to prereise, rather than the existing ll2uv implementation in powersimdata to loosen the coupling of the two packages and to avoid ambiguity between (lat, lon) and (lon, lat). Then, a KDTree is instantiated for each combination of (interconnect, state). If there exists one or more generators labelled with (interconnect, state) for which there are no substations with corresponding (interconnect, state):
- if we believe the generator in that state and physically connected to the other interconnection, then we create a KDTree for that entire interconnection (e.g. a plant that's physically located in Oklahoma may have a single transmission line that connects it to ERCOT).
- If we believe that the generator is in that state but we don't trust the interconnection information (e.g. plants in Virginia or Pennsylvania that claim to be connected to WECC), then we create a KDTree for the state.
- These assumptions are printed for the user to examine.
We have a small internal function which queries into the appropriate KDTree for each generator and translates the result to a substation ID.
For any generators whose locations can't be found this way (i.e. they don't have latitudes and longitudes defined), we fall back to the old ZIP-code matching logic, where we try to grab a random substation within a matching ZIP code.
Any remaining generators which can't be mapped to a substation will be filtered out of the final outputs.

Improving how generators are mapped to buses within substations: map_generator_to_bus_by_sub is refactored from the previous logic (always map to the lowest-voltage bus within the substation) to branching logic:

If there's only one bus within the substation or the generator's Pmax is less than 200 MW, we connect to the lowest-voltage bus.
If the generator is between 200 and 500 MW, connect to the second-lowest voltage bus
If the generator is 500 MW or great, connect to the highest-voltage bus

There's also a small unrelated fix to prevent duplicate 'interconnect' columns in some output CSVs, which was preventing the grid.mat files created from REISE.jl from being read back into PowerSimData for post-simulation analysis.

Testing

Tested manually. The printouts for the edge cases of mapping generators to substations looks like:

no substations within (Western, MA), will map generators to substations within MA instead
no substations within (Western, MO), will map generators to substations within MO instead
no substations within (Eastern, CA), will map generators to substations within CA instead
no substations within (Western, VA), will map generators to substations within VA instead
no substations within (Western, PA), will map generators to substations within PA instead
no substations within (Western, KS), will map generators to substations within Western instead
no substations within (ERCOT, OK), will map generators to substations within ERCOT instead

With this change, about 25% of generators end up mapped to a different substation (2,837 out of 12,735). In addition, 29 generators can be mapped which were not mapped with the previous logic. The most impactful change may have been how Palo Verde's generator were mapped: previously they ended up at a 69 kV substation within metro Phoenix; now they're appropriately connected to the 500 kV substation at their true location. Many other large inflexible generators are relocated as well, and generators in WECC seem to be particularly affected.

Running powerflows with the results shows drastic improvement in WECC, where infeasibilities were initially the worst. Previously, the total amount of transmission line limit violation energy that was required was about 25% of the total demand. After the change, this is down to 0.45%. In addition, transmission violations occur at fewer than half as many lines as before. Eastern powerflows are still running but I expect to see significant improvements there as well. EDIT: Eastern is done. The improvements aren't quite as good as in Western, but are still big improvements. Transmission violation energy is down to 5.1% (from 18.2% before) and at 0.19% of branches (from 0.61% before). That puts the overall USA-wide number of violating branches down to around 200, less than 1 out of every 400 branches, and about a third as many as before the refactor.

Usage Example/Visuals

All code is still launched via:

from prereise.gather.griddata.hifld import create_csvs
create_csvs(output_folder_name)

Time estimate

1 hour. Most of the new code is pretty straightforward, but about half of it is designed to combat edge cases caused by strange data inputs from the original EIA Form 860 data.

prereise/gather/griddata/hifld/data_process/generators.py

danielolsen · 2022-02-18T23:28:21Z

I've refactored this so that it ignores states completely, and uses the voltage(s) available at each substation to help ensure that generators don't get mapped to substations with inadequate transmission capacity, based on their listed 'grid voltage' within Form 860. In a quick one-day test on Eastern, it seems to help: previously the transmission violations were > 17 GW in every hour of the year, but in the test day we have transmission violations as low as 1.3 GW. I'll run the full year over the weekend and report more detailed results.

The nonsensical-interconnection listing is at least partially fixed by using the Balancing Authority column of each generator within Form 860 to map to interconnects, rather than the NERC region. For whatever reason, BA seems more reliable, and we only use the NERC region as a fall-back. The generators which end up getting mapped more than 50 miles away now all seem to be either:

very small generators
generators in very sparse areas (e.g. the Florida keys, southwest Texas)
wind generators in the Texas panhandle (which indicates that we probably need to revise how we draw the border between ERCOT/EI there)

danielolsen · 2022-02-22T18:52:49Z

Results for full-year runs with the new voltage-class and interconnection mapping of generators to substations:

Eastern interconnection: transmission violation energy reduced significantly (0.451% of annual demand, vs. 5.1% before) and at significantly fewer branches (0.10% of branches vs. 0.19% before)
Western interconnection: slight improvement in transmission violation energy (0.31% of annual demand, vs. 0.45% before) and number of branches (0.27% before vs. 0.29% before)
ERCOT: transmission violation energy reduced significantly (0.16% of annual demand, vs. 2.0% before) and slight improvement in number of branches (0.30% now vs. 0.43% before)

We're down to only 138 branches with violations across the whole USA. The next thing I'm going to try is revising the configuration of transformers within substations, which should hopefully increase the effective impedance between the higher-voltage buses at which large generators are connected and the lower-voltage buses connected to low-voltage branches with transmission violation.

danielolsen · 2022-02-24T23:51:12Z

The 'cascade' configuration (every bus within a substation is connected via a transformer to the next-highest bus) reduces transmission violation energy only barely compared to the previous configuration (every bus is connected to the substation's highest-voltage bus), but has a larger impact on reducing the number of branches at which transmission violations occur:

ERCOT: violation energy goes from 0.16% to 0.14%, and the number of branches goes from 0.29% to 0.26%
Western: violation energy goes from 0.305% to 0.304%, and the number of branches goes from 0.27% to 0.26%
Eastern: violation energy goes from 0.451% to 0.447%, and the number of branches goes from 0.10% to 0.08%

The remaining transmission violations are pretty heavily concentrated among a few branches, with what I believe are a few common root causes:

There are some large generators for which there's a transmission line in the original HIFLD dataset originating within the plant and going to a nearby substation, but no substation listed for the plant itself. As a result, the plant-end of this transmission line gets mapped to another nearby substation, potentially causing a connection where there shouldn't be one and/or resulting in the generator getting connected to the grid at a different substation with lower transfer capacity (although this is partially mitigated by the new code in this branch which looks at substation voltages and plant-level grid connection voltages). This is essentially Improve transmission coverage of HIFLD dataset when source data are missing #234. We could manually add these substations, or change how we build the transmission network topology so that transmission line endpoints with no substation in the vicinity create their own substations. Manually adding is a little less elegant, but has a lower potential for topological side-effects.
There is only a single assumed reactance and power rating for a line of a given voltage and length, ignoring the fact that some lines are double-circuit, triple-circuit, etc. This can result in a generator being connected at a transmission line which has only a third of the minimum generation that the generator is constrained to (e.g. the Catawba Nuclear station). This also pops up for some of the longer transmission lines in the West and Texas. EDIT: We don't currently have any code that can take in a subset of lines and alter their parameters; it would be fairly trivial to do so by assuming two (or three, etc.) identical parallel lines, but properly modeling different sorts of towers/lines would mean we'd need to make some assumptions about designs and start using the new code introduced in feat: add module to calculate transmission impedances from geometry #262 and feat: translate per-length impedance values to whole-line parameters #268.

prereise/gather/griddata/hifld/const.py

prereise/gather/griddata/hifld/data_process/generators.py

BainanXia · 2022-03-25T20:48:25Z

prereise/gather/griddata/hifld/data_process/generators.py

+    subs_voltage_lookup = {
+        (interconnect, voltage_level): substations_with_xyz.loc[
+            (substations_with_xyz["interconnect"] == interconnect)
+            & (substations_with_xyz["MAX_VOLT"] >= voltage_range["min"])


So we only care about lower bound here instead of the exact voltage_range defined in the dict voltage_ranges?

The primary goal of this PR was to avoid large generators getting mapped to substations without at least one high-enough voltage bus (and therefor probably too low of a transfer capacity). I guess we could make it more strict by ensuring that there's at least one bus that's truly within the range, which will have the tradeoff that the distance to the connection location may sometimes increase. What do you think?

Yeah, you are right. Having more strict filter will potentially give us farther mapping if there is no match nearby. Which side in the trade-off is more important, the location or the voltage range? If the voltage range turns out to be more important, let's go with the more strict way, otherwise, let's keep what we have.

Hard to say really. Locations will give us more representative renewable profiles, but voltage ranges can help ensure that generators aren't hooked up to substations with more transmission capacity than they should be (will impact the renewable curtailment). I'm leaning towards locations, but could probably be convinced otherwise.

prereise/gather/griddata/hifld/data_process/generators.py

prereise/gather/griddata/hifld/data_process/transmission.py

BainanXia

I think this is good to go. Thanks!

… possible

…_mapping refactor: improve mapping of generators to buses

danielolsen requested review from danlivengood, BainanXia, jenhagg and YifanLi86 February 10, 2022 00:27

danielolsen self-assigned this Feb 10, 2022

danielolsen added the hifld Related to ingestion of the HIFLD data label Feb 14, 2022

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 96a2164 to 727f654 Compare February 17, 2022 00:46

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Feb 17, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 67377d5 to 39058fb Compare February 22, 2022 17:42

danielolsen force-pushed the hifld branch from a704d1f to 05f3ead Compare February 25, 2022 20:49

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch 2 times, most recently from 616b948 to b7d62bb Compare February 25, 2022 20:59

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 1cd3e14 to 8e7ec12 Compare March 9, 2022 23:33

danielolsen added a commit that referenced this pull request Mar 10, 2022

TESTING: composite for first 13 commits within PR #267

5c7bcf2

rouille reviewed Mar 10, 2022

View reviewed changes

prereise/gather/griddata/hifld/const.py Show resolved Hide resolved

rouille reviewed Mar 10, 2022

View reviewed changes

prereise/gather/griddata/hifld/const.py Show resolved Hide resolved

rouille reviewed Mar 10, 2022

View reviewed changes

prereise/gather/griddata/hifld/const.py Show resolved Hide resolved

danielolsen force-pushed the hifld branch from 2a5ed21 to 851098e Compare March 15, 2022 00:02

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 8e7ec12 to 071ddf5 Compare March 15, 2022 00:03

danielolsen added a commit that referenced this pull request Mar 15, 2022

TESTING: composite for first 13 commits within PR #267

8ed6e66

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 071ddf5 to ea66a5b Compare March 15, 2022 01:03

danielolsen mentioned this pull request Mar 17, 2022

Improve transmission coverage of HIFLD dataset when source data are missing #234

Open

1 task

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/generators.py Outdated Show resolved Hide resolved

BainanXia reviewed Mar 25, 2022

View reviewed changes

prereise/gather/griddata/hifld/data_process/transmission.py Outdated Show resolved Hide resolved

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch 3 times, most recently from 1abbf65 to 6641a32 Compare March 30, 2022 23:39

BainanXia approved these changes Mar 31, 2022

View reviewed changes

danielolsen added 11 commits March 31, 2022 14:27

fix: avoid duplicate 'interconnect' columns

789f053

feat: add helper function for lat, lon to unit vector

c1a18d7

data: add balancing authority to interconnect mapping

03e07a9

refactor: use BA as primary interconnect mapper, NERC region as fallback

2f56d1f

refactor: override existing substation MIN_VOLT and MAX_VOLT whenever…

8861906

… possible

refactor: aggregate hydro generators by plant

7666200

refactor: map larger generators to higher-voltage buses

052f83c

refactor: map generators to substations using voltage information

facd3d3

refactor: connect transformers in 'cascade' rather than 'tree'

e7c6259

feat: add proxy substations

42b9be3

feat: add overrides to substation LINES filtering

ffb2d4d

danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 6641a32 to ffb2d4d Compare March 31, 2022 21:32

danielolsen merged commit 2dd569e into hifld Mar 31, 2022

danielolsen deleted the daniel/hifld_gen_bus_mapping branch March 31, 2022 21:40

danielolsen added a commit that referenced this pull request Apr 1, 2022

Merge pull request #267 from Breakthrough-Energy/daniel/hifld_gen_bus…

690d062

…_mapping refactor: improve mapping of generators to buses

danielolsen added a commit that referenced this pull request Apr 5, 2022

Merge pull request #267 from Breakthrough-Energy/daniel/hifld_gen_bus…

da83084

…_mapping refactor: improve mapping of generators to buses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: improve mapping of generators to buses #267

refactor: improve mapping of generators to buses #267

danielolsen commented Feb 10, 2022 •

edited

Loading

danielolsen commented Feb 18, 2022 •

edited

Loading

danielolsen commented Feb 22, 2022

danielolsen commented Feb 24, 2022 •

edited

Loading

BainanXia Mar 25, 2022

danielolsen Mar 30, 2022

BainanXia Mar 30, 2022

danielolsen Mar 30, 2022

BainanXia left a comment

refactor: improve mapping of generators to buses #267

refactor: improve mapping of generators to buses #267

Conversation

danielolsen commented Feb 10, 2022 • edited Loading

Purpose

What the code is doing

Testing

Usage Example/Visuals

Time estimate

danielolsen commented Feb 18, 2022 • edited Loading

danielolsen commented Feb 22, 2022

danielolsen commented Feb 24, 2022 • edited Loading

BainanXia Mar 25, 2022

Choose a reason for hiding this comment

danielolsen Mar 30, 2022

Choose a reason for hiding this comment

BainanXia Mar 30, 2022

Choose a reason for hiding this comment

danielolsen Mar 30, 2022

Choose a reason for hiding this comment

BainanXia left a comment

Choose a reason for hiding this comment

danielolsen commented Feb 10, 2022 •

edited

Loading

danielolsen commented Feb 18, 2022 •

edited

Loading

danielolsen commented Feb 24, 2022 •

edited

Loading