Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: improve mapping of generators to buses #267

Merged
merged 11 commits into from
Mar 31, 2022

Conversation

danielolsen
Copy link
Contributor

@danielolsen danielolsen commented Feb 10, 2022

Pull Request doc

Purpose

In preliminary testing, the hifld grid model had shown a lot of infeasibilities caused by mismatches between inflexible generators (namely coal, nuclear, and hydro) and local transmission capacities. This PR aims to improve this in two primary ways:

  • Aggregating multiple hydro generating units within one plant into a single generator with equivalent Pmax and Pmin sums
  • Improving how generators are mapped to substations
  • Improving how generators are mapped to buses within substations

Hydro aggregation is relevant to the improving how generators are mapped to buses within substations step, since we decide which bus within a substation to map to based on generator capacity.

The most complicated part by far is the how generators are mapped to substations. The old logic was:

  • Are coordinates available for the generator?
    • Yes: Are there any substations with ZIP codes that match the generator?
      • Yes: return the closest substation within that ZIP
      • No: return the closest substation within the 'closest' 200 ZIP codes
    • No: Are there any substations with ZIP codes that match the generator?
      • Yes: return an arbitrarily chosen one
      • No: return nothing

The new logic is:

  • Are coordinates available for the generator?
    • Yes: Find the closest substation within that generator's state & interconnection (with some edge-case handling)
    • No: Are there any substations with ZIP codes that match the generator?
      • Yes: return an arbitrarily chosen one
      • No: return nothing

The no-coordinates case is a fraction of a percent; the big changes are from the cases where we do have coordinates and now use those as the primary substation mapping tool, rather than ZIP codes.

What the code is doing

Aggregating hydro units: a new aggregate_hydro_generators_by_plant_id function is added, which groups hydro units by plant ID, sums Pmin and Pmax, and returns all other attributes and index from the first unit in the group.

Improving the mapping of generators to substations:

  • We remove the old map_generator_to_sub_by_location and add a replacement map_generators_to_sub_by_location. These function names differ because the previous one operated on one generator at a time as a part of an apply call, while the new one operates on the entire generators dataframe.
  • The new function translates lats & lons of the substations and generators dataframes into (x, y, z) pairs in 3d space, where the center of the earth is (0, 0, 0) and the radius is 1. We use a new latlon_to_xyz function that we add to prereise, rather than the existing ll2uv implementation in powersimdata to loosen the coupling of the two packages and to avoid ambiguity between (lat, lon) and (lon, lat). Then, a KDTree is instantiated for each combination of (interconnect, state). If there exists one or more generators labelled with (interconnect, state) for which there are no substations with corresponding (interconnect, state):
    • if we believe the generator in that state and physically connected to the other interconnection, then we create a KDTree for that entire interconnection (e.g. a plant that's physically located in Oklahoma may have a single transmission line that connects it to ERCOT).
    • If we believe that the generator is in that state but we don't trust the interconnection information (e.g. plants in Virginia or Pennsylvania that claim to be connected to WECC), then we create a KDTree for the state.
    • These assumptions are printed for the user to examine.
  • We have a small internal function which queries into the appropriate KDTree for each generator and translates the result to a substation ID.
  • For any generators whose locations can't be found this way (i.e. they don't have latitudes and longitudes defined), we fall back to the old ZIP-code matching logic, where we try to grab a random substation within a matching ZIP code.
  • Any remaining generators which can't be mapped to a substation will be filtered out of the final outputs.

Improving how generators are mapped to buses within substations: map_generator_to_bus_by_sub is refactored from the previous logic (always map to the lowest-voltage bus within the substation) to branching logic:

  • If there's only one bus within the substation or the generator's Pmax is less than 200 MW, we connect to the lowest-voltage bus.
  • If the generator is between 200 and 500 MW, connect to the second-lowest voltage bus
  • If the generator is 500 MW or great, connect to the highest-voltage bus

There's also a small unrelated fix to prevent duplicate 'interconnect' columns in some output CSVs, which was preventing the grid.mat files created from REISE.jl from being read back into PowerSimData for post-simulation analysis.

Testing

Tested manually. The printouts for the edge cases of mapping generators to substations looks like:

no substations within (Western, MA), will map generators to substations within MA instead
no substations within (Western, MO), will map generators to substations within MO instead
no substations within (Eastern, CA), will map generators to substations within CA instead
no substations within (Western, VA), will map generators to substations within VA instead
no substations within (Western, PA), will map generators to substations within PA instead
no substations within (Western, KS), will map generators to substations within Western instead
no substations within (ERCOT, OK), will map generators to substations within ERCOT instead

With this change, about 25% of generators end up mapped to a different substation (2,837 out of 12,735). In addition, 29 generators can be mapped which were not mapped with the previous logic. The most impactful change may have been how Palo Verde's generator were mapped: previously they ended up at a 69 kV substation within metro Phoenix; now they're appropriately connected to the 500 kV substation at their true location. Many other large inflexible generators are relocated as well, and generators in WECC seem to be particularly affected.

Running powerflows with the results shows drastic improvement in WECC, where infeasibilities were initially the worst. Previously, the total amount of transmission line limit violation energy that was required was about 25% of the total demand. After the change, this is down to 0.45%. In addition, transmission violations occur at fewer than half as many lines as before. Eastern powerflows are still running but I expect to see significant improvements there as well. EDIT: Eastern is done. The improvements aren't quite as good as in Western, but are still big improvements. Transmission violation energy is down to 5.1% (from 18.2% before) and at 0.19% of branches (from 0.61% before). That puts the overall USA-wide number of violating branches down to around 200, less than 1 out of every 400 branches, and about a third as many as before the refactor.

Usage Example/Visuals

All code is still launched via:

from prereise.gather.griddata.hifld import create_csvs
create_csvs(output_folder_name)

Time estimate

1 hour. Most of the new code is pretty straightforward, but about half of it is designed to combat edge cases caused by strange data inputs from the original EIA Form 860 data.

@danielolsen danielolsen self-assigned this Feb 10, 2022
@danielolsen danielolsen added the hifld Related to ingestion of the HIFLD data label Feb 14, 2022
@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 96a2164 to 727f654 Compare February 17, 2022 00:46
@danielolsen
Copy link
Contributor Author

danielolsen commented Feb 18, 2022

I've refactored this so that it ignores states completely, and uses the voltage(s) available at each substation to help ensure that generators don't get mapped to substations with inadequate transmission capacity, based on their listed 'grid voltage' within Form 860. In a quick one-day test on Eastern, it seems to help: previously the transmission violations were > 17 GW in every hour of the year, but in the test day we have transmission violations as low as 1.3 GW. I'll run the full year over the weekend and report more detailed results.

The nonsensical-interconnection listing is at least partially fixed by using the Balancing Authority column of each generator within Form 860 to map to interconnects, rather than the NERC region. For whatever reason, BA seems more reliable, and we only use the NERC region as a fall-back. The generators which end up getting mapped more than 50 miles away now all seem to be either:

  • very small generators
  • generators in very sparse areas (e.g. the Florida keys, southwest Texas)
  • wind generators in the Texas panhandle (which indicates that we probably need to revise how we draw the border between ERCOT/EI there)

@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 67377d5 to 39058fb Compare February 22, 2022 17:42
@danielolsen
Copy link
Contributor Author

Results for full-year runs with the new voltage-class and interconnection mapping of generators to substations:

  • Eastern interconnection: transmission violation energy reduced significantly (0.451% of annual demand, vs. 5.1% before) and at significantly fewer branches (0.10% of branches vs. 0.19% before)
  • Western interconnection: slight improvement in transmission violation energy (0.31% of annual demand, vs. 0.45% before) and number of branches (0.27% before vs. 0.29% before)
  • ERCOT: transmission violation energy reduced significantly (0.16% of annual demand, vs. 2.0% before) and slight improvement in number of branches (0.30% now vs. 0.43% before)

We're down to only 138 branches with violations across the whole USA. The next thing I'm going to try is revising the configuration of transformers within substations, which should hopefully increase the effective impedance between the higher-voltage buses at which large generators are connected and the lower-voltage buses connected to low-voltage branches with transmission violation.

@danielolsen
Copy link
Contributor Author

danielolsen commented Feb 24, 2022

The 'cascade' configuration (every bus within a substation is connected via a transformer to the next-highest bus) reduces transmission violation energy only barely compared to the previous configuration (every bus is connected to the substation's highest-voltage bus), but has a larger impact on reducing the number of branches at which transmission violations occur:

  • ERCOT: violation energy goes from 0.16% to 0.14%, and the number of branches goes from 0.29% to 0.26%
  • Western: violation energy goes from 0.305% to 0.304%, and the number of branches goes from 0.27% to 0.26%
  • Eastern: violation energy goes from 0.451% to 0.447%, and the number of branches goes from 0.10% to 0.08%

The remaining transmission violations are pretty heavily concentrated among a few branches, with what I believe are a few common root causes:

  • There are some large generators for which there's a transmission line in the original HIFLD dataset originating within the plant and going to a nearby substation, but no substation listed for the plant itself. As a result, the plant-end of this transmission line gets mapped to another nearby substation, potentially causing a connection where there shouldn't be one and/or resulting in the generator getting connected to the grid at a different substation with lower transfer capacity (although this is partially mitigated by the new code in this branch which looks at substation voltages and plant-level grid connection voltages). This is essentially Improve transmission coverage of HIFLD dataset when source data are missing #234. We could manually add these substations, or change how we build the transmission network topology so that transmission line endpoints with no substation in the vicinity create their own substations. Manually adding is a little less elegant, but has a lower potential for topological side-effects.
  • There is only a single assumed reactance and power rating for a line of a given voltage and length, ignoring the fact that some lines are double-circuit, triple-circuit, etc. This can result in a generator being connected at a transmission line which has only a third of the minimum generation that the generator is constrained to (e.g. the Catawba Nuclear station). This also pops up for some of the longer transmission lines in the West and Texas. EDIT: We don't currently have any code that can take in a subset of lines and alter their parameters; it would be fairly trivial to do so by assuming two (or three, etc.) identical parallel lines, but properly modeling different sorts of towers/lines would mean we'd need to make some assumptions about designs and start using the new code introduced in feat: add module to calculate transmission impedances from geometry #262 and feat: translate per-length impedance values to whole-line parameters #268.

@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch 2 times, most recently from 616b948 to b7d62bb Compare February 25, 2022 20:59
@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 1cd3e14 to 8e7ec12 Compare March 9, 2022 23:33
subs_voltage_lookup = {
(interconnect, voltage_level): substations_with_xyz.loc[
(substations_with_xyz["interconnect"] == interconnect)
& (substations_with_xyz["MAX_VOLT"] >= voltage_range["min"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only care about lower bound here instead of the exact voltage_range defined in the dict voltage_ranges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary goal of this PR was to avoid large generators getting mapped to substations without at least one high-enough voltage bus (and therefor probably too low of a transfer capacity). I guess we could make it more strict by ensuring that there's at least one bus that's truly within the range, which will have the tradeoff that the distance to the connection location may sometimes increase. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Having more strict filter will potentially give us farther mapping if there is no match nearby. Which side in the trade-off is more important, the location or the voltage range? If the voltage range turns out to be more important, let's go with the more strict way, otherwise, let's keep what we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say really. Locations will give us more representative renewable profiles, but voltage ranges can help ensure that generators aren't hooked up to substations with more transmission capacity than they should be (will impact the renewable curtailment). I'm leaning towards locations, but could probably be convinced otherwise.

@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch 3 times, most recently from 1abbf65 to 6641a32 Compare March 30, 2022 23:39
Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go. Thanks!

@danielolsen danielolsen force-pushed the daniel/hifld_gen_bus_mapping branch from 6641a32 to ffb2d4d Compare March 31, 2022 21:32
@danielolsen danielolsen merged commit 2dd569e into hifld Mar 31, 2022
@danielolsen danielolsen deleted the daniel/hifld_gen_bus_mapping branch March 31, 2022 21:40
danielolsen added a commit that referenced this pull request Apr 1, 2022
…_mapping

refactor: improve mapping of generators to buses
danielolsen added a commit that referenced this pull request Apr 5, 2022
…_mapping

refactor: improve mapping of generators to buses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hifld Related to ingestion of the HIFLD data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants