diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 39429a76..c1c1e4e4 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-09-21T15:12:05","documenter_version":"1.0.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-09-23T19:49:49","documenter_version":"1.0.1"}} \ No newline at end of file diff --git a/dev/artifact/index.html b/dev/artifact/index.html index e25c8f9d..6c6bd931 100644 --- a/dev/artifact/index.html +++ b/dev/artifact/index.html @@ -1,2 +1,2 @@ -ExaData Artifact · ExaPF.jl

ExaData Artifact

The ExaData artifact contains test cases relevant to the Exascale Computing Project. It is built from the git repository available at ExaData. Apart from the standard MATPOWER files it additionally contains demand scenarios and contingencies used in multiperiod security constrained optimal power flow settings.

+ExaData Artifact · ExaPF.jl

ExaData Artifact

The ExaData artifact contains test cases relevant to the Exascale Computing Project. It is built from the git repository available at ExaData. Apart from the standard MATPOWER files it additionally contains demand scenarios and contingencies used in multiperiod security constrained optimal power flow settings.

diff --git a/dev/index.html b/dev/index.html index bafaa83f..bc6a6c17 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · ExaPF.jl

ExaPF

ExaPF.jl is a package to solve the power flow problem on upcoming exascale architectures. The code has been designed to be:

  1. Portable: Targeting exascale architectures implies a focus on graphics processing units (GPUs) as these systems lack substantial computational performance through classical CPUs.
  2. Differentiable: All the expressions implemented in ExaPF are fully compatible with ForwardDiff.jl, and routines are provided to extract first- and second-order derivatives to solve efficiently power flow and optimal power flow problems.

ExaPF implements a vectorized modeler for power systems, which allows to manipulate basic expressions. All expressions are fully differentiable : their first and second-order derivatives can be extracted efficiently using automatic differentiation. In addition, we leverage the packages CUDA.jl and KernelAbstractions.jl to make ExaPF portable across GPU architectures.

Table of contents

Manual

Library

Artifact

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

+Home · ExaPF.jl

ExaPF

ExaPF.jl is a package to solve the power flow problem on upcoming exascale architectures. The code has been designed to be:

  1. Portable: Targeting exascale architectures implies a focus on graphics processing units (GPUs) as these systems lack substantial computational performance through classical CPUs.
  2. Differentiable: All the expressions implemented in ExaPF are fully compatible with ForwardDiff.jl, and routines are provided to extract first- and second-order derivatives to solve efficiently power flow and optimal power flow problems.

ExaPF implements a vectorized modeler for power systems, which allows to manipulate basic expressions. All expressions are fully differentiable : their first and second-order derivatives can be extracted efficiently using automatic differentiation. In addition, we leverage the packages CUDA.jl and KernelAbstractions.jl to make ExaPF portable across GPU architectures.

Table of contents

Manual

Library

Artifact

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

diff --git a/dev/lib/autodiff/index.html b/dev/lib/autodiff/index.html index 973678c4..18cbaf36 100644 --- a/dev/lib/autodiff/index.html +++ b/dev/lib/autodiff/index.html @@ -10,4 +10,4 @@ )

Seed the duals with the coloring based seeds to compute the Jacobian or Hessian $M$.

source
ExaPF.AutoDiff.partials!Function
partials!(jac::AbstractJacobian)

Extract partials from Jacobian jac in jac.J.

source
partials!(hess::AbstractFullHessian)

Extract partials from Hessian hess into hess.H.

source
ExaPF.AutoDiff.set_value!Function
set_value!(
     jac,
     primals::AbstractVector{T}
-) where {T}

Set values of ForwardDiff.Dual numbers in jac to primals.

source
+) where {T}

Set values of ForwardDiff.Dual numbers in jac to primals.

source diff --git a/dev/lib/formulations/index.html b/dev/lib/formulations/index.html index d6ba936c..7b8c6bd7 100644 --- a/dev/lib/formulations/index.html +++ b/dev/lib/formulations/index.html @@ -237,4 +237,4 @@ 1-element Vector{Float64}: 4509.0275 source

Composition of expressions

The different expressions can be combined together in several different ways.

ExaPF.MultiExpressionsType
MultiExpressions <: AbstractExpression

Implement expressions concatenation. Takes as input a vector of expressions [expr1,...,exprN] and concatenate them in a single expression mexpr, such that

    mexpr(x) = [expr1(x) ; expr2(x) ; ... ; exprN(x)]
-
source
ExaPF.ComposedExpressionsType
ComposedExpressions{Expr1<:PolarBasis, Expr2} <: AbstractExpression

Implement expression composition. Takes as input two expressions expr1 and expr2 and returns a composed expression cexpr such that ``` cexpr(x) = expr2 ∘ expr1(x)

Notes

Currently, only PolarBasis is supported for expr1.

source
+source
ExaPF.ComposedExpressionsType
ComposedExpressions{Expr1<:PolarBasis, Expr2} <: AbstractExpression

Implement expression composition. Takes as input two expressions expr1 and expr2 and returns a composed expression cexpr such that ``` cexpr(x) = expr2 ∘ expr1(x)

Notes

Currently, only PolarBasis is supported for expr1.

source
diff --git a/dev/lib/linearsolver/index.html b/dev/lib/linearsolver/index.html index a0d1fb9a..be2c42b7 100644 --- a/dev/lib/linearsolver/index.html +++ b/dev/lib/linearsolver/index.html @@ -8,4 +8,4 @@ tol=1e-8, maxiter=size(A, 1), verbose=false, - maxtol=1e20)

BiCGSTAB implementation according to

Van der Vorst, Henk A. "Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems." SIAM Journal on scientific and Statistical Computing 13, no. 2 (1992): 631-644.

source

Available linear solvers can be queried with

ExaPF.LinearSolvers.list_solversFunction
list_solvers(::KernelAbstractions.Device)

List linear solvers available on current device.

source

Preconditioning

To solve linear systems with iterative methods, ExaPF provides an implementation of a block-Jacobi preconditioner, portable on GPU.

ExaPF.LinearSolvers.AbstractPreconditionerType
AbstractPreconditioner

Preconditioners for the iterative solvers mostly focused on GPUs

source

Block-Jacobi preconditioner

ExaPF.LinearSolvers.BlockJacobiPreconditionerType
BlockJacobiPreconditioner

Overlapping-Schwarz preconditioner.

Attributes

  • nblocks::Int64: Number of partitions or blocks.
  • blocksize::Int64: Size of each block.
  • partitions::Vector{Vector{Int64}}:npart` partitions stored as lists
  • cupartitions: partitions transfered to the GPU
  • lpartitions::Vector{Int64}`: Length of each partitions.
  • culpartitions::Vector{Int64}`: Length of each partitions, on the GPU.
  • blocks: Dense blocks of the block-Jacobi
  • cublocks: Js transfered to the GPU
  • map: The partitions as a mapping to construct views
  • cumap: cumap transferred to the GPU`
  • part: Partitioning as output by Metis
  • cupart: part transferred to the GPU
source
ExaPF.LinearSolvers.updateFunction
function update(J::CuSparseMatrixCSR, p)

Update the preconditioner p from the sparse Jacobian J in CSR format for the GPU

  1. The dense blocks cuJs are filled from the sparse Jacobian J
  2. To a batch inversion of the dense blocks using CUBLAS
  3. Extract the preconditioner matrix p.P from the dense blocks cuJs
source
function update(J::SparseMatrixCSC, p)

Update the preconditioner p from the sparse Jacobian J in CSC format for the CPU

Note that this implements the same algorithm as for the GPU and becomes very slow on CPU with growing number of blocks.

source
+ maxtol=1e20)

BiCGSTAB implementation according to

Van der Vorst, Henk A. "Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems." SIAM Journal on scientific and Statistical Computing 13, no. 2 (1992): 631-644.

source

Available linear solvers can be queried with

ExaPF.LinearSolvers.list_solversFunction
list_solvers(::KernelAbstractions.Device)

List linear solvers available on current device.

source

Preconditioning

To solve linear systems with iterative methods, ExaPF provides an implementation of a block-Jacobi preconditioner, portable on GPU.

ExaPF.LinearSolvers.AbstractPreconditionerType
AbstractPreconditioner

Preconditioners for the iterative solvers mostly focused on GPUs

source

Block-Jacobi preconditioner

ExaPF.LinearSolvers.BlockJacobiPreconditionerType
BlockJacobiPreconditioner

Overlapping-Schwarz preconditioner.

Attributes

  • nblocks::Int64: Number of partitions or blocks.
  • blocksize::Int64: Size of each block.
  • partitions::Vector{Vector{Int64}}:npart` partitions stored as lists
  • cupartitions: partitions transfered to the GPU
  • lpartitions::Vector{Int64}`: Length of each partitions.
  • culpartitions::Vector{Int64}`: Length of each partitions, on the GPU.
  • blocks: Dense blocks of the block-Jacobi
  • cublocks: Js transfered to the GPU
  • map: The partitions as a mapping to construct views
  • cumap: cumap transferred to the GPU`
  • part: Partitioning as output by Metis
  • cupart: part transferred to the GPU
source
ExaPF.LinearSolvers.updateFunction
function update(J::CuSparseMatrixCSR, p)

Update the preconditioner p from the sparse Jacobian J in CSR format for the GPU

  1. The dense blocks cuJs are filled from the sparse Jacobian J
  2. To a batch inversion of the dense blocks using CUBLAS
  3. Extract the preconditioner matrix p.P from the dense blocks cuJs
source
function update(J::SparseMatrixCSC, p)

Update the preconditioner p from the sparse Jacobian J in CSC format for the CPU

Note that this implements the same algorithm as for the GPU and becomes very slow on CPU with growing number of blocks.

source
diff --git a/dev/lib/powersystem/index.html b/dev/lib/powersystem/index.html index 1688a876..81720c57 100644 --- a/dev/lib/powersystem/index.html +++ b/dev/lib/powersystem/index.html @@ -9,4 +9,4 @@ Buses: 1354 (Slack: 1. PV: 259. PQ: 1094) Generators: 260.source

API Reference

Network elements

ExaPF.PowerSystem.AbstractNetworkElementType
AbstractNetworkElement

Abstraction for all physical elements being parts of a AbstractPowerSystem. Elements are divided in

  • transmission lines (Lines)
  • buses (Buses)
  • generators (Generators)
source

List of elements:

ExaPF.PowerSystem.BusesType
Buses <: AbstractNetworkElement

Buses of a transmission network.

source
ExaPF.PowerSystem.LinesType
Lines <: AbstractNetworkElement

Lines of a transmission network.

source
ExaPF.PowerSystem.GeneratorsType
Generators <: AbstractElement

Generators in a transmission network

source

Network attributes

ExaPF.PowerSystem.AbstractNetworkAttributeType
AbstractNetworkAttribute

Attribute of a AbstractPowerSystem.

source

List of attributes:

ExaPF.PowerSystem.NumberOfBusesType
NumberOfBuses <: AbstractNetworkAttribute

Number of buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfLinesType
NumberOfLines <: AbstractNetworkAttribute

Number of lines in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfGeneratorsType
NumberOfGenerators <: AbstractNetworkAttribute

Number of generators in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfPVBusesType
NumberOfPVBuses <: AbstractNetworkAttribute

Number of PV buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfPQBusesType
NumberOfPQBuses <: AbstractNetworkAttribute

Number of PQ buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfSlackBusesType
NumberOfSlackBuses <: AbstractNetworkAttribute

Number of slack buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.BaseMVAType
BaseMVA <: AbstractNetworkAttribute

Base MVA of the network.

source
ExaPF.PowerSystem.BusAdmittanceMatrixType
BusAdmittanceMatrix <: AbstractNetworkAttribute

Bus admittance matrix associated with the topology of the network.

source

Query the indexing of the different elements in a given network:

ExaPF.PowerSystem.PVIndexesType
PVIndexes <: AbstractIndexing

Indexes of the PV buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.PQIndexesType
PQIndexes <: AbstractIndexing

Indexes of the PQ buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.SlackIndexesType
SlackIndexes <: AbstractIndexing

Indexes of the slack buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.GeneratorIndexesType
GeneratorIndexes <: AbstractIndexing

Indexes of the generators in a AbstractPowerSystem.

source

Network values

ExaPF.PowerSystem.AbstractNetworkValuesType
AbstractNetworkValues

Numerical values attached to the different attributes of the network.

source

List of values:

ExaPF.PowerSystem.VoltageMagnitudeType
VoltageMagnitude <: AbstractNetworkValues

Magnitude |v| of the voltage v = |v| exp(i θ).

source
ExaPF.PowerSystem.VoltageAngleType
VoltageAngle <: AbstractNetworkValues

Angle θ of the voltage v = |v| exp(i θ).

source
ExaPF.PowerSystem.ActivePowerType
ActivePower <: AbstractNetworkValues

Active power P of the complex power S = P + iQ.

source
ExaPF.PowerSystem.ReactivePowerType
ReactivePower <: AbstractNetworkValues

Reactive power Q of the complex power S = P + iQ.

source
ExaPF.PowerSystem.ActiveLoadType
ActiveLoad <: AbstractNetworkValues

Active load Pd at buses.

source
ExaPF.PowerSystem.ReactiveLoadType
ReactiveLoad <: AbstractNetworkValues

Reactive load Qd at buses.

source

Function to get the range of a given value:

ExaPF.PowerSystem.boundsFunction
bounds(pf::AbstractPowerSystem, attr::AbstractNetworkAttribute, val::AbstractNetworkValues)

Return lower and upper bounds corresponding to the admissible values of the AbstractNetworkAttribute attr.

Examples

p_min, p_max = bounds(pf, Generator(), ActivePower())
 v_min, v_max = bounds(pf, Buses(), VoltageMagnitude())
-
source
+source diff --git a/dev/man/autodiff/index.html b/dev/man/autodiff/index.html index a17ae397..a9d2f198 100644 --- a/dev/man/autodiff/index.html +++ b/dev/man/autodiff/index.html @@ -35,4 +35,4 @@ F[npq + i] += coef_cos * sin_val - coef_sin * cos_val end end -end

These two abstractions are a powerful tool that allow us to implement the forward mode in vectorized form where the number of directions or tangent components of a tangent variable are the number of Jacobian colors. We illustrate this in the figure below with a point-wise vector product x .* y

SIMD AD for point-wise vector product \\label{fig:simd}

This natural way of computing the compressed Jacobian yields a very high performing code that is portable to any vector architecture, given that a similar package like CUDA.jl exists. We note that similar packages for the Intel Compute Engine and AMD ROCm are in development called oneAPI.jl and AMDGPU.jl, respectively. We expect our package to be portable to AMD and Intel GPUs in the future.

+end

These two abstractions are a powerful tool that allow us to implement the forward mode in vectorized form where the number of directions or tangent components of a tangent variable are the number of Jacobian colors. We illustrate this in the figure below with a point-wise vector product x .* y

SIMD AD for point-wise vector product \\label{fig:simd}

This natural way of computing the compressed Jacobian yields a very high performing code that is portable to any vector architecture, given that a similar package like CUDA.jl exists. We note that similar packages for the Intel Compute Engine and AMD ROCm are in development called oneAPI.jl and AMDGPU.jl, respectively. We expect our package to be portable to AMD and Intel GPUs in the future.

diff --git a/dev/man/benchmark/index.html b/dev/man/benchmark/index.html index a4e04d67..4cf8579c 100644 --- a/dev/man/benchmark/index.html +++ b/dev/man/benchmark/index.html @@ -2,4 +2,4 @@ Benchmark · ExaPF.jl

Benchmark

For the purpose of performance regression testing, ExaPF provides a lightweight benchmark script. It allows to test the various configurations for the linear solvers used in the Newton-Raphson algorithm, and run them on a specific hardware. The main julia script benchmark/benchmarks.jl takes all its options from the command line. The benchmark script takes as input a linear solver (e.g. KrylovBICGSTAB), a target architecture as a KernelAbstractions object (CPU or CUDABackend), and a case filename which is included in the ExaData artifact. An exhaustive list of all available linear solvers can be obtained via ExaPF.LinearSolvers.list_solvers.

Running

julia --project benchmark/benchmarks.jl KrylovBICGSTAB CUDABackend case300.m

yields

KrylovBICGSTAB, CUDABackend, case300.m,  69.0,  3.57,  43.7, true

The first three fields are the settings of the benchmark run. They are followed by three timings in milliseconds:

  1. the time taken by the Newton-Raphson algorithm to solve the power flow,
  2. the timings for the Jacobian accumulation using AutoDiff,
  3. and the time for the linear solver, including the preconditioner.

To acquire these timings the code is run three times to avoid any precompilation effects. The last field confirms the Newton-Raphson convergence. In case more verbose output is desired, one has to manually set the verbosity in benchmark/benchmarks.jl by changing

powerflow_solver = NewtonRaphson(tol=ntol)

to one of the following options:

powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_NONE)
 powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_LOW)
 powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_MEDIUM)
-powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_HIGH)

A shell script benchmark/benchmarks.sh is provided to gather timings with various canonical configurations and storing them in a file cpu_REV.log and gpu_REF.log, where REV is the sha1 hash of the current checked out ExaPF version.

+powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_HIGH)

A shell script benchmark/benchmarks.sh is provided to gather timings with various canonical configurations and storing them in a file cpu_REV.log and gpu_REF.log, where REV is the sha1 hash of the current checked out ExaPF version.

diff --git a/dev/man/formulations/index.html b/dev/man/formulations/index.html index db9841f8..a79b79af 100644 --- a/dev/man/formulations/index.html +++ b/dev/man/formulations/index.html @@ -173,4 +173,4 @@ 0.0 0.02340899999999987 0.007743999999999858 - + diff --git a/dev/man/linearsolver/index.html b/dev/man/linearsolver/index.html index 6f4e9aeb..9def6693 100644 --- a/dev/man/linearsolver/index.html +++ b/dev/man/linearsolver/index.html @@ -1,3 +1,3 @@ Linear Solvers · ExaPF.jl

Linear Solver

Overview

As mentioned before, a linear solver is required to compute the Newton step in

dx .= jacobian(x)\f(x)

Our package supports the following linear solvers:

  • cuSOLVER with csrlsvqr (GPU),
  • Krylov.jl with dqgmres and bicgstab (CPU/GPU),
  • UMFPACK through the default Julia \ operator (CPU),
  • generic BiCGSTAB implementation [Vorst1992] (CPU/GPU),
  • or any linear solver wrapped in LinearAlgebra.

Preconditioning

Using only an iterative solver leads to divergence and bad performance due to ill-conditioning of the Jacobian. This is a known phenomenon in power systems. That's why this package comes with a block Jacobi preconditioner that is tailored towards GPUs and is proven to work well with power flow problems.

The Jacobian is partitioned into a dense block diagonal structure using Metis.jl, where each block is inverted to build our preconditioner P.

Dense block Jacobi preconditioner \\label{fig:preconditioner}

Compared to incomplete Cholesky and incomplete LU this preconditioner is easily portable to the GPU if the number of blocks is high enough. ExaPF.jl uses the batch BLAS calls from cuBLAS to invert the single blocks.

CUDA.@sync pivot, info = CUDA.CUBLAS.getrf_batched!(blocks, true)
-CUDA.@sync pivot, info, p.cuJs = CUDA.CUBLAS.getri_batched(blocks, pivot)

Assuming that other vendors will provide such batched BLAS APIs, this code is portable to other GPU architectures.

  • Vorst1992Vorst, H. A. van der. 1992. “Bi-Cgstab: A Fast and Smoothly Converging Variant of Bi-Cg for the Solution of Nonsymmetric Linear Systems.”SIAM Journal on Scientific and Statistical Computing 13 (2): 631–44
+CUDA.@sync pivot, info, p.cuJs = CUDA.CUBLAS.getri_batched(blocks, pivot)

Assuming that other vendors will provide such batched BLAS APIs, this code is portable to other GPU architectures.

diff --git a/dev/man/powersystem/index.html b/dev/man/powersystem/index.html index 8e17b5ff..17eaa734 100644 --- a/dev/man/powersystem/index.html +++ b/dev/man/powersystem/index.html @@ -16,4 +16,4 @@ 2 julia> PS.get(ps, PS.NumberOfSlackBuses()) -1 +1 diff --git a/dev/quickstart/index.html b/dev/quickstart/index.html index 785ef2a0..d0071635 100644 --- a/dev/quickstart/index.html +++ b/dev/quickstart/index.html @@ -18,9 +18,9 @@ #it 5: 1.52178e-11 Power flow has converged: true * #iterations: 5 - * Time Jacobian (s) ........: 0.4023 - * Time linear solver (s) ...: 0.0168 - * Time total (s) ...........: 0.7970

Implicitly, ExaPF has just proceed to the following operations:

This compact syntax allows to solve quickly any powerflow equations in a few lines a code. However, in most case, the user may want more coarse grained control on the different objects manipulated.

Detailed version

In what follows, we detail step by step the detailed procedure to solve the powerflow equations.

How to load a MATPOWER instance as a PowerNetwork object?

We start by importing a MATPOWER instance to a ExaPF.PowerSystem.PowerNetwork object:

julia> pf = PS.PowerNetwork(datafile)PowerNetwork object with:
+  * Time Jacobian (s) ........: 0.4158
+  * Time linear solver (s) ...: 0.0410
+  * Time total (s) ...........: 0.8467

Implicitly, ExaPF has just proceed to the following operations:

This compact syntax allows to solve quickly any powerflow equations in a few lines a code. However, in most case, the user may want more coarse grained control on the different objects manipulated.

Detailed version

In what follows, we detail step by step the detailed procedure to solve the powerflow equations.

How to load a MATPOWER instance as a PowerNetwork object?

We start by importing a MATPOWER instance to a ExaPF.PowerSystem.PowerNetwork object:

julia> pf = PS.PowerNetwork(datafile)PowerNetwork object with:
     Buses: 1354 (Slack: 1. PV: 259. PQ: 1094)
     Generators: 260.

The different fields of the object pf specify the characteristics of the network. For instance, we can retrieve the number of buses or get the indexing of the PV buses with

julia> nbus = PS.get(pf, PS.NumberOfBuses())1354
julia> pv_indexes = pf.pv;

However, a ExaPF.PowerSystem.PowerNetwork object stores only the physical attributes of the network. To choose a given mathematical formulation, we need to pass the object pf to an ExaPF.AbstractFormulation layer. Currently, only the polar formulation is provided with the ExaPF.PolarForm structure. In the future, other formulations (e.g. RectangularForm) may be implemented as well.

How to solve the powerflow equations?

To solve the powerflow equations, we need to choose a given mathematical formulation for the equations of the network. To each formulation corresponds a given state $x$ and control $u$. Using polar representation of the voltage vector, such as $\bm{v} = |v|e^{j \theta}$, each bus $i=1, \cdots, N_B$ must satisfy the power balance equations:

\[\begin{aligned} p_i &= v_i \sum_{j}^{n} v_j (g_{ij}\cos{(\theta_i - \theta_j)} + b_{ij}\sin{(\theta_i - \theta_j})) \,, \\ @@ -61,9 +61,9 @@ #it 5: 1.52178e-11 Power flow has converged: true * #iterations: 5 - * Time Jacobian (s) ........: 0.0045 - * Time linear solver (s) ...: 0.0168 - * Time total (s) ...........: 0.0216

Here, the algorithm solves the powerflow equations in 5 iterations. The algorithm modifies the values of stack inplace, to avoid any unnecessary memory allocations.

How to deport the computation on the GPU?

Now, how can we deport the resolution on the GPU? The procedure looks exactly the same. It suffices to initiate a new ExaPF.PolarForm object, but on the GPU:

julia> polar_gpu = ExaPF.PolarForm(pf, CUDABackend())Polar formulation (instantiated on device CUDA.CUDAKernels.CUDABackend(false, false))
+  * Time Jacobian (s) ........: 0.0048
+  * Time linear solver (s) ...: 0.0398
+  * Time total (s) ...........: 0.0449

Here, the algorithm solves the powerflow equations in 5 iterations. The algorithm modifies the values of stack inplace, to avoid any unnecessary memory allocations.

How to deport the computation on the GPU?

Now, how can we deport the resolution on the GPU? The procedure looks exactly the same. It suffices to initiate a new ExaPF.PolarForm object, but on the GPU:

julia> polar_gpu = ExaPF.PolarForm(pf, CUDABackend())Polar formulation (instantiated on device CUDA.CUDAKernels.CUDABackend(false, false))
 Network characteristics:
     #buses:      1354  (#slack: 1  #PV: 259  #PQ: 1094)
     #generators: 260
@@ -79,9 +79,9 @@
 #it 5: 9.96622e-12
 Power flow has converged: true
   * #iterations: 5
-  * Time Jacobian (s) ........: 2.6187
-  * Time linear solver (s) ...: 0.3118
-  * Time total (s) ...........: 3.7690

Note that we get exactly the same iterations as when we solve the power flow equations on the CPU.

How to solve the linear system with BICGSTAB?

By default, the algorithm runs with a direct solver, which might be inefficient for large problems. To overcome this issue, ExaPF implements a wrapper for different iterative algorithms (GMRES, BICGSTAB).

The performance of iterative solvers is usually improved if we use a preconditioner. ExaPF.jl implements an overlapping Schwarz preconditioner, tailored for GPU usage. To build an instance with 8 blocks, just write

julia> npartitions = 8;
julia> jac_gpu = jx_gpu.J;
julia> precond = LS.BlockJacobiPreconditioner(jac_gpu, npartitions, CUDABackend());

You can attach the preconditioner to an BICGSTAB algorithm simply as

julia> linear_solver = ExaPF.KrylovBICGSTAB(jac_gpu; P=precond);

(this will use the BICGSTAB algorithm implemented in Krylov.jl).

We need to update accordingly the tolerance of the Newton-Raphson algorithm (the iterative solver is less accurate than the direct solver):

julia> pf_algo = NewtonRaphson(; verbose=1, tol=1e-7)NewtonRaphson(20, 1.0e-7, 1)

We reset the variables to their initial values:

julia> ExaPF.init!(polar_gpu, stack_gpu)

Then, solving the power flow with the iterative solvers directly translates to one call to nlsolve!:

julia> convergence = ExaPF.nlsolve!(pf_algo, jx_gpu, stack_gpu; linear_solver=linear_solver)#it 0: 1.15103e+02
+  * Time Jacobian (s) ........: 2.7291
+  * Time linear solver (s) ...: 0.3145
+  * Time total (s) ...........: 3.9130

Note that we get exactly the same iterations as when we solve the power flow equations on the CPU.

How to solve the linear system with BICGSTAB?

By default, the algorithm runs with a direct solver, which might be inefficient for large problems. To overcome this issue, ExaPF implements a wrapper for different iterative algorithms (GMRES, BICGSTAB).

The performance of iterative solvers is usually improved if we use a preconditioner. ExaPF.jl implements an overlapping Schwarz preconditioner, tailored for GPU usage. To build an instance with 8 blocks, just write

julia> npartitions = 8;
julia> jac_gpu = jx_gpu.J;
julia> precond = LS.BlockJacobiPreconditioner(jac_gpu, npartitions, CUDABackend());

You can attach the preconditioner to an BICGSTAB algorithm simply as

julia> linear_solver = ExaPF.KrylovBICGSTAB(jac_gpu; P=precond);

(this will use the BICGSTAB algorithm implemented in Krylov.jl).

We need to update accordingly the tolerance of the Newton-Raphson algorithm (the iterative solver is less accurate than the direct solver):

julia> pf_algo = NewtonRaphson(; verbose=1, tol=1e-7)NewtonRaphson(20, 1.0e-7, 1)

We reset the variables to their initial values:

julia> ExaPF.init!(polar_gpu, stack_gpu)

Then, solving the power flow with the iterative solvers directly translates to one call to nlsolve!:

julia> convergence = ExaPF.nlsolve!(pf_algo, jx_gpu, stack_gpu; linear_solver=linear_solver)#it 0: 1.15103e+02
 #it 1: 1.50328e+01
 #it 2: 5.88242e-01
 #it 3: 4.88493e-03
@@ -90,5 +90,5 @@
 Power flow has converged: true
   * #iterations: 5
   * Time Jacobian (s) ........: 0.0031
-  * Time linear solver (s) ...: 6.2972
-  * Time total (s) ...........: 6.3016
+ * Time linear solver (s) ...: 6.3513 + * Time total (s) ...........: 6.3557 diff --git a/dev/tutorials/batch_evaluation/index.html b/dev/tutorials/batch_evaluation/index.html index b51d798f..7e92a575 100644 --- a/dev/tutorials/batch_evaluation/index.html +++ b/dev/tutorials/batch_evaluation/index.html @@ -21,26 +21,26 @@ #states : 14

Then, ExaPF can also instantiate a NetworkStack object, with the memory required to store the variables of the different scenarios:

blk_stack = ExaPF.NetworkStack(blk_polar)
210-elements NetworkStack{Vector{Float64}}

We can pass the scenarios manually using the function set_params!:

ploads = rand(nbus, nscen);
 qloads = rand(nbus, nscen);
 ExaPF.set_params!(blk_stack, ploads, qloads)
90-element Vector{Float64}:
- 0.5941674359224426
- 0.8522137692943615
- 0.898905599393412
- 0.7706263989403442
- 0.10242552305040731
- 0.1093694659440686
- 0.24382786436620574
- 0.2752153853589945
- 0.38808902745269813
- 0.010479031681545958
+ 0.8559617137259788
+ 0.3523470433277963
+ 0.6542128581116299
+ 0.3935661696262659
+ 0.9582549503601299
+ 0.887634035398972
+ 0.03380000409131956
+ 0.5064801070009395
+ 0.3930399440732132
+ 0.9987776865280941
  ⋮
- 0.43651484710096244
- 0.33871529924162136
- 0.04994653600628185
- 0.004424390430899927
- 0.2871564676452214
- 0.14405974048525894
- 0.8920382389401823
- 0.47054505640394984
- 0.4780947294409694

The structure blk_stack stores $N$ different realizations for the variables stored in the field input (vmag, vang and pgen). By default, the initial values are set according to the values specified in blk_polar (usually defined when importing the data from the instance file):

reshape(blk_stack.vmag, nbus, nscen)
9×10 Matrix{Float64}:
+ 0.8592911762557289
+ 0.7207032819592781
+ 0.41789405160400883
+ 0.7682046111725708
+ 0.36016705254828374
+ 0.9546987186525899
+ 0.9421158379310158
+ 0.7397453193432354
+ 0.33591601602255794

The structure blk_stack stores $N$ different realizations for the variables stored in the field input (vmag, vang and pgen). By default, the initial values are set according to the values specified in blk_polar (usually defined when importing the data from the instance file):

reshape(blk_stack.vmag, nbus, nscen)
9×10 Matrix{Float64}:
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
@@ -50,32 +50,32 @@
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

Only the parameters are varying according to the scenarios we passed as input in the constructor:

reshape(blk_stack.pload, nbus, nscen)
9×10 Matrix{Float64}:
- 0.81872   0.58149   0.993387  0.296823   …  0.985247   0.390188  0.233188
- 0.176536  0.79199   0.362258  0.0995252     0.871698   0.373152  0.295096
- 0.705494  0.891877  0.942154  0.562401      0.807404   0.499716  0.348183
- 0.749389  0.368638  0.161537  0.209402      0.112298   0.452152  0.501078
- 0.292929  0.378692  0.345064  0.829092      0.412382   0.198977  0.616138
- 0.504299  0.95563   0.736151  0.793169   …  0.765123   0.163487  0.995522
- 0.616872  0.836865  0.494355  0.370883      0.0303673  0.986578  0.878993
- 0.96446   0.881766  0.488783  0.295786      0.505388   0.94591   0.454449
- 0.629922  0.426857  0.660948  0.0796323     0.631545   0.637177  0.269932

Evaluate expressions in block

ExaPF takes advantage of the block structure when using a BlockPolarForm.

As an example, suppose we want to evaluate the power flow balances in block form with a PowerFlowBalance expression:

powerflow = ExaPF.PowerFlowBalance(blk_polar) ∘ ExaPF.PolarBasis(blk_polar);
ExaPF.ComposedExpressions{ExaPF.PolarBasis{Vector{Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}, ExaPF.PowerFlowBalance{Vector{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}}(PolarBasis (AbstractExpression), PowerFlowBalance (AbstractExpression))

A block evaluation takes as input the NetworkStack blk_stack structure:

m = div(length(powerflow), nscen);
+ 0.155886  0.126517   0.81435    0.89088    …  0.731609  0.0511128  0.260856
+ 0.324172  0.237615   0.0301523  0.634825      0.96086   0.264304   0.544441
+ 0.101134  0.81371    0.71187    0.618062      0.503013  0.438363   0.605646
+ 0.921298  0.686628   0.0307813  0.275111      0.113424  0.386383   0.244243
+ 0.896607  0.391815   0.521849   0.0917468     0.105306  0.220718   0.994517
+ 0.817869  0.215029   0.946439   0.877059   …  0.107703  0.113814   0.886688
+ 0.774943  0.368762   0.520698   0.123852      0.472091  0.179356   0.338556
+ 0.121014  0.0335929  0.709168   0.437472      0.314495  0.0391468  0.405176
+ 0.816785  0.157054   0.738976   0.804088      0.634106  0.977827   0.251487

Evaluate expressions in block

ExaPF takes advantage of the block structure when using a BlockPolarForm.

As an example, suppose we want to evaluate the power flow balances in block form with a PowerFlowBalance expression:

powerflow = ExaPF.PowerFlowBalance(blk_polar) ∘ ExaPF.PolarBasis(blk_polar);
ExaPF.ComposedExpressions{ExaPF.PolarBasis{Vector{Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}, ExaPF.PowerFlowBalance{Vector{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}}(PolarBasis (AbstractExpression), PowerFlowBalance (AbstractExpression))

A block evaluation takes as input the NetworkStack blk_stack structure:

m = div(length(powerflow), nscen);
 blk_output = zeros(m * nscen);
 powerflow(blk_output, blk_stack); # inplace evaluation
 reshape(blk_output, m, nscen)
14×10 Matrix{Float64}:
- -1.45346    -0.83801    -1.26774    …  -0.758302   -1.25685     -1.3349
- -0.144506    0.0418768   0.0921542     -0.0425958  -0.350284    -0.501817
-  0.749389    0.368638    0.161537       0.112298    0.452152     0.501078
-  0.292929    0.378692    0.345064       0.412382    0.198977     0.616138
-  0.504299    0.95563     0.736151       0.765123    0.163487     0.995522
-  0.616872    0.836865    0.494355   …   0.0303673   0.986578     0.878993
-  0.96446     0.881766    0.488783       0.505388    0.94591      0.454449
-  0.629922    0.426857    0.660948       0.631545    0.637177     0.269932
-  0.603626    0.107701    0.620703       0.400412    0.0744612   -0.162576
- -0.155574   -0.254843    0.498071       0.397487    0.361444     0.0291565
- -0.174131    0.715848    0.36658    …   0.294529    0.543651    -0.13944
-  0.0648279  -0.164528    0.0793475      0.0566474  -0.159946     0.713038
-  0.0477154  -0.0382624   0.647422       0.214784    0.115698     0.243045
-  0.147089    0.123457    0.352825       0.638275   -0.00369888   0.237095

We get $N$ different results for the power flow balance equations, depending on which scenario we are on.

Solve power flow in block on the CPU

Once the different structures used for block evaluation instantiated, one is able to solve the power flow in block on the CPU using the same function nlsolve!. The block Jacobian is evaluated with automatic differentiation using a ArrowheadJacobian structure:

blk_jx = ExaPF.ArrowheadJacobian(blk_polar, powerflow, State());
+ -1.30583   -1.39239    -1.59985    …  -0.66914    -1.3657     -1.08556
+ -0.748866  -0.0362896  -0.13813       -0.346987   -0.411637   -0.244354
+  0.921298   0.686628    0.0307813      0.113424    0.386383    0.244243
+  0.896607   0.391815    0.521849       0.105306    0.220718    0.994517
+  0.817869   0.215029    0.946439       0.107703    0.113814    0.886688
+  0.774943   0.368762    0.520698   …   0.472091    0.179356    0.338556
+  0.121014   0.0335929   0.709168       0.314495    0.0391468   0.405176
+  0.816785   0.157054    0.738976       0.634106    0.977827    0.251487
+  0.226566  -0.0438189   0.661271       0.396779    0.150221    0.601205
+  0.700255   0.023036   -0.241488       0.555347    0.291852    0.102167
+  0.604134   0.211697    0.671528   …  -0.219698    0.386255    0.671199
+ -0.1452     0.513068    0.150179      -0.0637118   0.0719483   0.763116
+  0.27898   -0.187753    0.457293       0.470065    0.273789    0.512245
+  0.15204    0.399831   -0.212529       0.501052    0.0226803   0.094916

We get $N$ different results for the power flow balance equations, depending on which scenario we are on.

Solve power flow in block on the CPU

Once the different structures used for block evaluation instantiated, one is able to solve the power flow in block on the CPU using the same function nlsolve!. The block Jacobian is evaluated with automatic differentiation using a ArrowheadJacobian structure:

blk_jx = ExaPF.ArrowheadJacobian(blk_polar, powerflow, State());
 blk_jx.J
140×140 SparseArrays.SparseMatrixCSC{Float64, Int64} with 820 stored entries:
 ⎡⡱⣮⡲⣞⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
 ⎢⣸⢮⣻⣾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
@@ -125,17 +125,17 @@
   * #iterations: 4
   * Time Jacobian (s) ........: 0.0002
   * Time linear solver (s) ...: 0.0004
-  * Time total (s) ...........: 0.0007
+  * Time total (s) ...........: 0.0008
 

At the solution, we get different values for the voltage magnitudes at the PQ nodes:

reshape(blk_stack.vmag, nbus, nscen)
9×10 Matrix{Float64}:
  1.0       1.0       1.0       1.0       …  1.0       1.0       1.0
  1.0       1.0       1.0       1.0          1.0       1.0       1.0
  1.0       1.0       1.0       1.0          1.0       1.0       1.0
- 0.966309  0.972902  0.921606  0.964459     0.932831  0.975283  0.984083
- 0.979567  0.964036  0.88175   0.957589     0.902597  0.943513  0.966502
- 0.992338  0.948066  0.937069  0.957202  …  0.952171  0.962881  0.969966
- 0.979551  0.964292  0.925649  0.924367     0.949875  0.973275  0.920203
- 0.9821    0.973861  0.933158  0.932569     0.955048  0.97987   0.954013
- 0.956651  0.956936  0.894466  0.901908     0.891719  0.970809  0.957078

Solve power flow in batch on the GPU

When the BlockPolarForm model is instantiated on the GPU, the expressions are evaluated in batch. The syntax to solve the power flow equations is exactly the same as on the CPU, using cusolverRF to solve the different linear systems:

using CUDA, CUSOLVERRF
+ 0.938158  0.974327  0.966341  0.955018     0.939675  0.962022  0.936623
+ 0.874072  0.964998  0.964821  0.906585     0.914236  0.937921  0.903355
+ 0.942826  0.970611  0.948198  0.946349  …  0.982096  0.967596  0.918776
+ 0.955353  0.948228  0.94191   0.952082     0.968001  0.965162  0.880006
+ 0.963892  0.978364  0.958151  0.975502     0.958567  0.971792  0.924435
+ 0.927653  0.944859  0.96838   0.934585     0.906917  0.946672  0.922845

Solve power flow in batch on the GPU

When the BlockPolarForm model is instantiated on the GPU, the expressions are evaluated in batch. The syntax to solve the power flow equations is exactly the same as on the CPU, using cusolverRF to solve the different linear systems:

using CUDA, CUSOLVERRF
 polar_gpu = ExaPF.load_polar("case9.m", CUDABackend());
 blk_polar_gpu = ExaPF.BlockPolarForm(polar_gpu, nscen); # load model on GPU
 blk_stack_gpu = ExaPF.NetworkStack(blk_polar_gpu);
@@ -153,7 +153,7 @@
     linear_solver=rf_solver,
 )
Power flow has converged: true
   * #iterations: 4
-  * Time Jacobian (s) ........: 0.0012
-  * Time linear solver (s) ...: 0.0567
-  * Time total (s) ...........: 0.0590
-
+ * Time Jacobian (s) ........: 0.0013 + * Time linear solver (s) ...: 0.0579 + * Time total (s) ...........: 0.0603 + diff --git a/dev/tutorials/direct_solver/index.html b/dev/tutorials/direct_solver/index.html index fc51469e..fc908ebc 100644 --- a/dev/tutorials/direct_solver/index.html +++ b/dev/tutorials/direct_solver/index.html @@ -15,16 +15,16 @@ jx = ExaPF.Jacobian(polar, func, State()) # init AD ExaPF.nlsolve!(pf_solver, jx, stack)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 1.1515
-  * Time linear solver (s) ...: 0.3227
-  * Time total (s) ...........: 1.4793
+  * Time Jacobian (s) ........: 1.1632
+  * Time linear solver (s) ...: 0.3274
+  * Time total (s) ...........: 1.4957
 

KLU (CPU)

KLU is an efficient sparse linear solver, initially designed for circuit simulation problems. It is often considered as one of the state-of-the-art linear solver to solve power flow problems. Conveniently, KLU is wrapped in Julia with the package KLU.jl. KLU.jl implements a proper interface to use KLU. We just have to implement a forgiving function for LinearAlgebra.lu!

LinearAlgebra.lu!(K::KLU.KLUFactorization, J) = KLU.klu!(K, J)

Then, we are ready to solve a power flow with KLU using our current abstraction. One has just to create a new instance of LS.DirectSolver:

klu_factorization = KLU.klu(jx.J)
-klu_solver = LS.DirectSolver(klu_factorization)
ExaPF.LinearSolvers.DirectSolver{KLU.KLUFactorization{Float64, Int64}}(KLU.KLUFactorization{Float64, Int64}(KLU.LibKLU.klu_l_common_struct(0.001, 1.2, 1.2, 10.0, 0.0, 1, 0, 2, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000, 1, 0, 0, 17036, 17036, 17036, 0, -1.0, -1.0, -1.0, -1.0, 0.0, 0x00000000005a0030, 0x0000000000760150), Ptr{Nothing} @0x0000000051ef7890, Ptr{Nothing} @0x000000004b89b190, 17036, [0, 4, 12, 19, 24, 27, 37, 40, 44, 54  …  129350, 129359, 129363, 129369, 129380, 129384, 129390, 129402, 129408, 129412], [0, 383, 5548, 13344, 1, 458, 4027, 5306, 6117, 11823  …  6009, 6175, 9238, 13805, 13971, 17034, 8879, 9239, 16675, 17035], [1050.4551624894714, -629.5373899630948, -421.251258755022, 88.40062041224645, 995.2128749049234, -479.4136240982223, -138.29482949968113, -145.959631463846, -230.919755492703, 22.972728768573532  …  -480.4784394426574, -39.711370520688675, 520.164296001656, -3360.900457400469, -208.11090119999253, 3568.9822586698524, -39.40403545216682, 39.40522422897138, -242.80215386396614, 242.80196093598838]))

and pass it to nlsolve!:

ExaPF.init!(polar, stack) # reinit stack
+klu_solver = LS.DirectSolver(klu_factorization)
ExaPF.LinearSolvers.DirectSolver{KLU.KLUFactorization{Float64, Int64}}(KLU.KLUFactorization{Float64, Int64}(KLU.LibKLU.klu_l_common_struct(0.001, 1.2, 1.2, 10.0, 0.0, 1, 0, 2, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000, 1, 0, 0, 17036, 17036, 17036, 0, -1.0, -1.0, -1.0, -1.0, 0.0, 0x00000000005a0030, 0x0000000000760150), Ptr{Nothing} @0x000000004bb4b390, Ptr{Nothing} @0x000000004ca83420, 17036, [0, 4, 12, 19, 24, 27, 37, 40, 44, 54  …  129350, 129359, 129363, 129369, 129380, 129384, 129390, 129402, 129408, 129412], [0, 383, 5548, 13344, 1, 458, 4027, 5306, 6117, 11823  …  6009, 6175, 9238, 13805, 13971, 17034, 8879, 9239, 16675, 17035], [1050.4551624894714, -629.5373899630948, -421.251258755022, 88.40062041224645, 995.2128749049234, -479.4136240982223, -138.29482949968113, -145.959631463846, -230.919755492703, 22.972728768573532  …  -480.4784394426574, -39.711370520688675, 520.164296001656, -3360.900457400469, -208.11090119999253, 3568.9822586698524, -39.40403545216682, 39.40522422897138, -242.80215386396614, 242.80196093598838]))

and pass it to nlsolve!:

ExaPF.init!(polar, stack) # reinit stack
 ExaPF.nlsolve!(pf_solver, jx, stack; linear_solver=klu_solver)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 0.2700
-  * Time linear solver (s) ...: 0.0335
-  * Time total (s) ...........: 0.3084
+  * Time Jacobian (s) ........: 0.2816
+  * Time linear solver (s) ...: 0.0336
+  * Time total (s) ...........: 0.3202
 

We observe KLU reduces considerably the time spent in the linear solver.

cusolverRF (CUDA)

cusolverRF is an efficient LU refactorization routine implemented in CUDA. It is wrapped in Julia inside the package CUSOLVERRF.jl:

using CUSOLVERRF

The principle is the following: the initial symbolic factorization is computed on the CPU with the routine chosen by the user. Then, each time we have to refactorize a matrix with the same sparsity pattern, we can recompute the numerical factorization entirely on the GPU. In practice, this solver is efficient at refactorizing a given matrix if the sparsity is significant.

This is of direct relevance for us, as (i) the sparsity of the power flow Jacobian doesn't change along the Newton iterations and (ii) the Jacobian is super-sparse. In ExaPF, it is the linear solver of choice when it comes to solve the power flow entirely on the GPU.

CUSOLVERRF.jl follows the LinearAlgebra's interface, so we can use it directly in ExaPF. We first have to instantiate everything on the GPU:

using CUDA
 polar_gpu = ExaPF.load_polar("case9241pegase.m", CUDABackend())
 stack_gpu = ExaPF.NetworkStack(polar_gpu)
@@ -34,7 +34,7 @@
 rf_solver = LS.DirectSolver(rf_fac)
ExaPF.LinearSolvers.DirectSolver{CUSOLVERRF.RFLU{Float64, CUSOLVERRF.CuSparseSVDeprecated, CUSOLVERRF.CuSparseSMDeprecated}}(CUSOLVERRF.RFLU{Float64, CUSOLVERRF.CuSparseSVDeprecated, CUSOLVERRF.CuSparseSMDeprecated}
 )

Then, we are able to solve the power flow entirely on the GPU, simply as

ExaPF.nlsolve!(pf_solver, jx_gpu, stack_gpu; linear_solver=rf_solver)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 1.3949
-  * Time linear solver (s) ...: 0.0824
-  * Time total (s) ...........: 1.4788
-
+ * Time Jacobian (s) ........: 1.4186 + * Time linear solver (s) ...: 0.0851 + * Time total (s) ...........: 1.5054 +