From f05a0f4c805d89682f790f1a94bda30e4dce9ded Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sat, 17 Feb 2024 18:53:30 -0600 Subject: [PATCH] build based on bf91811 --- dev/.documenter-siteinfo.json | 2 +- dev/artifact/index.html | 2 +- dev/index.html | 2 +- dev/lib/autodiff/index.html | 2 +- dev/lib/formulations/index.html | 2 +- dev/lib/linearsolver/index.html | 2 +- dev/lib/powersystem/index.html | 2 +- dev/man/autodiff/index.html | 2 +- dev/man/benchmark/index.html | 2 +- dev/man/formulations/index.html | 2 +- dev/man/linearsolver/index.html | 2 +- dev/man/powersystem/index.html | 2 +- dev/quickstart/index.html | 24 ++--- dev/tutorials/batch_evaluation/index.html | 102 +++++++++++----------- dev/tutorials/direct_solver/index.html | 34 ++++---- 15 files changed, 92 insertions(+), 92 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index a4222c67..5bd33a4d 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-10T18:50:48","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-02-17T18:53:25","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/artifact/index.html b/dev/artifact/index.html index 13b6f9cf..1bae0edf 100644 --- a/dev/artifact/index.html +++ b/dev/artifact/index.html @@ -1,2 +1,2 @@ -ExaData Artifact · ExaPF.jl

ExaData Artifact

The ExaData artifact contains test cases relevant to the Exascale Computing Project. It is built from the git repository available at ExaData. Apart from the standard MATPOWER files it additionally contains demand scenarios and contingencies used in multiperiod security constrained optimal power flow settings.

+ExaData Artifact · ExaPF.jl

ExaData Artifact

The ExaData artifact contains test cases relevant to the Exascale Computing Project. It is built from the git repository available at ExaData. Apart from the standard MATPOWER files it additionally contains demand scenarios and contingencies used in multiperiod security constrained optimal power flow settings.

diff --git a/dev/index.html b/dev/index.html index 49cbf368..d65e981b 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · ExaPF.jl

ExaPF

ExaPF.jl is a package to solve the power flow problem on upcoming exascale architectures. The code has been designed to be:

  1. Portable: Targeting exascale architectures implies a focus on graphics processing units (GPUs) as these systems lack substantial computational performance through classical CPUs.
  2. Differentiable: All the expressions implemented in ExaPF are fully compatible with ForwardDiff.jl, and routines are provided to extract first- and second-order derivatives to solve efficiently power flow and optimal power flow problems.

ExaPF implements a vectorized modeler for power systems, which allows to manipulate basic expressions. All expressions are fully differentiable : their first and second-order derivatives can be extracted efficiently using automatic differentiation. In addition, we provide extensions that leverage the packages CUDA.jl, [AMDGPU.jl]((https://github.com/JuliaGPU/AMDGPU.jl), and KernelAbstractions.jl to make ExaPF portable across GPU architectures.

Table of contents

Manual

Library

Artifact

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

+Home · ExaPF.jl

ExaPF

ExaPF.jl is a package to solve the power flow problem on upcoming exascale architectures. The code has been designed to be:

  1. Portable: Targeting exascale architectures implies a focus on graphics processing units (GPUs) as these systems lack substantial computational performance through classical CPUs.
  2. Differentiable: All the expressions implemented in ExaPF are fully compatible with ForwardDiff.jl, and routines are provided to extract first- and second-order derivatives to solve efficiently power flow and optimal power flow problems.

ExaPF implements a vectorized modeler for power systems, which allows to manipulate basic expressions. All expressions are fully differentiable : their first and second-order derivatives can be extracted efficiently using automatic differentiation. In addition, we provide extensions that leverage the packages CUDA.jl, [AMDGPU.jl]((https://github.com/JuliaGPU/AMDGPU.jl), and KernelAbstractions.jl to make ExaPF portable across GPU architectures.

Table of contents

Manual

Library

Artifact

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

diff --git a/dev/lib/autodiff/index.html b/dev/lib/autodiff/index.html index e4f6ab18..d426fd6b 100644 --- a/dev/lib/autodiff/index.html +++ b/dev/lib/autodiff/index.html @@ -10,4 +10,4 @@ )

Seed the duals with the coloring based seeds to compute the Jacobian or Hessian $M$.

source
ExaPF.AutoDiff.partials!Function
partials!(jac::AbstractJacobian)

Extract partials from Jacobian jac in jac.J.

source
partials!(hess::AbstractFullHessian)

Extract partials from Hessian hess into hess.H.

source
ExaPF.AutoDiff.set_value!Function
set_value!(
     jac,
     primals::AbstractVector{T}
-) where {T}

Set values of ForwardDiff.Dual numbers in jac to primals.

source
+) where {T}

Set values of ForwardDiff.Dual numbers in jac to primals.

source diff --git a/dev/lib/formulations/index.html b/dev/lib/formulations/index.html index 2a1daeff..6425e97e 100644 --- a/dev/lib/formulations/index.html +++ b/dev/lib/formulations/index.html @@ -237,4 +237,4 @@ 1-element Vector{Float64}: 4509.0275 source

Composition of expressions

The different expressions can be combined together in several different ways.

ExaPF.MultiExpressionsType
MultiExpressions <: AbstractExpression

Implement expressions concatenation. Takes as input a vector of expressions [expr1,...,exprN] and concatenate them in a single expression mexpr, such that

    mexpr(x) = [expr1(x) ; expr2(x) ; ... ; exprN(x)]
-
source
ExaPF.ComposedExpressionsType
ComposedExpressions{Expr1<:PolarBasis, Expr2} <: AbstractExpression

Implement expression composition. Takes as input two expressions expr1 and expr2 and returns a composed expression cexpr such that ``` cexpr(x) = expr2 ∘ expr1(x)

Notes

Currently, only PolarBasis is supported for expr1.

source
+source
ExaPF.ComposedExpressionsType
ComposedExpressions{Expr1<:PolarBasis, Expr2} <: AbstractExpression

Implement expression composition. Takes as input two expressions expr1 and expr2 and returns a composed expression cexpr such that ``` cexpr(x) = expr2 ∘ expr1(x)

Notes

Currently, only PolarBasis is supported for expr1.

source
diff --git a/dev/lib/linearsolver/index.html b/dev/lib/linearsolver/index.html index f8b6e13f..65435f47 100644 --- a/dev/lib/linearsolver/index.html +++ b/dev/lib/linearsolver/index.html @@ -2,4 +2,4 @@ Linear Solvers · ExaPF.jl

Linear solvers

Description

ExaPF allows to solve linear systems with either direct and indirect linear algebra, both on CPU and on GPU. To solve a linear system $Ax = b$, ExaPF uses the function ldiv!.

ExaPF.LinearSolvers.ldiv!Function
ldiv!(solver, x, A, y)
 ldiv!(solver, x, y)
  • solver::AbstractLinearSolver: linear solver to solve the system
  • x::AbstractVector: Solution
  • A::AbstractMatrix: Input matrix
  • y::AbstractVector: RHS

Solve the linear system $A x = y$ using the algorithm specified in solver. If A is not specified, the function will used directly the factorization stored inside solver.

source

Direct solvers

ExaPF wraps UMFPACK (shipped with SuiteSparse.jl) on the CPU, and CUSPARSE on CUDA device.

ExaPF.LinearSolvers.DirectSolverType
DirectSolver <: AbstractLinearSolver

Solve linear system $A x = y$ with direct linear algebra.

  • On the CPU, DirectSolver uses UMFPACK to solve the linear system
  • On CUDA GPU, DirectSolver redirects the resolution to the method CUSOLVER.csrlsvqr
source

Iterative solvers

ExaPF.LinearSolvers.BicgstabType
Bicgstab <: AbstractIterativeLinearSolver
 Bicgstab(precond; verbose=0, rtol=1e-10, atol=1e-10)

Wrap Krylov.jl's BICGSTAB algorithm to solve iteratively the linear system $A x = y$.

source
ExaPF.LinearSolvers.DqgmresType
Dqgmres <: AbstractIterativeLinearSolver
-Dqgmres(precond; verbose=false, memory=4)

Wrap Krylov.jl's Dqgmres algorithm to solve iteratively the linear system $A x = y$.

source

Available linear solvers can be queried with

A default solver is provided for each vendor backend.

+Dqgmres(precond; verbose=false, memory=4)

Wrap Krylov.jl's Dqgmres algorithm to solve iteratively the linear system $A x = y$.

source

Available linear solvers can be queried with

ExaPF.LinearSolvers.list_solversFunction
list_solvers(::KernelAbstractions.Device)

List linear solvers available on current device.

source

A default solver is provided for each vendor backend.

ExaPF.LinearSolvers.default_linear_solverFunction
default_linear_solver(A, ::KA.CPU)

Default linear solver on the CPU.

source
diff --git a/dev/lib/powersystem/index.html b/dev/lib/powersystem/index.html index d1efe0b4..3c8ad801 100644 --- a/dev/lib/powersystem/index.html +++ b/dev/lib/powersystem/index.html @@ -9,4 +9,4 @@ Buses: 1354 (Slack: 1. PV: 259. PQ: 1094) Generators: 260.source

API Reference

Network elements

ExaPF.PowerSystem.AbstractNetworkElementType
AbstractNetworkElement

Abstraction for all physical elements being parts of a AbstractPowerSystem. Elements are divided in

  • transmission lines (Lines)
  • buses (Buses)
  • generators (Generators)
source

List of elements:

ExaPF.PowerSystem.BusesType
Buses <: AbstractNetworkElement

Buses of a transmission network.

source
ExaPF.PowerSystem.LinesType
Lines <: AbstractNetworkElement

Lines of a transmission network.

source
ExaPF.PowerSystem.GeneratorsType
Generators <: AbstractElement

Generators in a transmission network

source

Network attributes

ExaPF.PowerSystem.AbstractNetworkAttributeType
AbstractNetworkAttribute

Attribute of a AbstractPowerSystem.

source

List of attributes:

ExaPF.PowerSystem.NumberOfBusesType
NumberOfBuses <: AbstractNetworkAttribute

Number of buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfLinesType
NumberOfLines <: AbstractNetworkAttribute

Number of lines in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfGeneratorsType
NumberOfGenerators <: AbstractNetworkAttribute

Number of generators in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfPVBusesType
NumberOfPVBuses <: AbstractNetworkAttribute

Number of PV buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfPQBusesType
NumberOfPQBuses <: AbstractNetworkAttribute

Number of PQ buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.NumberOfSlackBusesType
NumberOfSlackBuses <: AbstractNetworkAttribute

Number of slack buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.BaseMVAType
BaseMVA <: AbstractNetworkAttribute

Base MVA of the network.

source
ExaPF.PowerSystem.BusAdmittanceMatrixType
BusAdmittanceMatrix <: AbstractNetworkAttribute

Bus admittance matrix associated with the topology of the network.

source

Query the indexing of the different elements in a given network:

ExaPF.PowerSystem.PVIndexesType
PVIndexes <: AbstractIndexing

Indexes of the PV buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.PQIndexesType
PQIndexes <: AbstractIndexing

Indexes of the PQ buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.SlackIndexesType
SlackIndexes <: AbstractIndexing

Indexes of the slack buses in a AbstractPowerSystem.

source
ExaPF.PowerSystem.GeneratorIndexesType
GeneratorIndexes <: AbstractIndexing

Indexes of the generators in a AbstractPowerSystem.

source

Network values

ExaPF.PowerSystem.AbstractNetworkValuesType
AbstractNetworkValues

Numerical values attached to the different attributes of the network.

source

List of values:

ExaPF.PowerSystem.VoltageMagnitudeType
VoltageMagnitude <: AbstractNetworkValues

Magnitude |v| of the voltage v = |v| exp(i θ).

source
ExaPF.PowerSystem.VoltageAngleType
VoltageAngle <: AbstractNetworkValues

Angle θ of the voltage v = |v| exp(i θ).

source
ExaPF.PowerSystem.ActivePowerType
ActivePower <: AbstractNetworkValues

Active power P of the complex power S = P + iQ.

source
ExaPF.PowerSystem.ReactivePowerType
ReactivePower <: AbstractNetworkValues

Reactive power Q of the complex power S = P + iQ.

source
ExaPF.PowerSystem.ActiveLoadType
ActiveLoad <: AbstractNetworkValues

Active load Pd at buses.

source
ExaPF.PowerSystem.ReactiveLoadType
ReactiveLoad <: AbstractNetworkValues

Reactive load Qd at buses.

source

Function to get the range of a given value:

ExaPF.PowerSystem.boundsFunction
bounds(pf::AbstractPowerSystem, attr::AbstractNetworkAttribute, val::AbstractNetworkValues)

Return lower and upper bounds corresponding to the admissible values of the AbstractNetworkAttribute attr.

Examples

p_min, p_max = bounds(pf, Generator(), ActivePower())
 v_min, v_max = bounds(pf, Buses(), VoltageMagnitude())
-
source
+source diff --git a/dev/man/autodiff/index.html b/dev/man/autodiff/index.html index 573f0858..aee1b83a 100644 --- a/dev/man/autodiff/index.html +++ b/dev/man/autodiff/index.html @@ -35,4 +35,4 @@ F[npq + i] += coef_cos * sin_val - coef_sin * cos_val end end -end

These two abstractions are a powerful tool that allow us to implement the forward mode in vectorized form where the number of directions or tangent components of a tangent variable are the number of Jacobian colors. We illustrate this in the figure below with a point-wise vector product x .* y

SIMD AD for point-wise vector product \\label{fig:simd}

This natural way of computing the compressed Jacobian yields a very high performing code that is portable to any vector architecture, given that a similar package like CUDA.jl exists. We note that similar packages for the Intel Compute Engine and AMD ROCm are in development called oneAPI.jl and AMDGPU.jl, respectively. We expect our package to be portable to AMD and Intel GPUs in the future.

+end

These two abstractions are a powerful tool that allow us to implement the forward mode in vectorized form where the number of directions or tangent components of a tangent variable are the number of Jacobian colors. We illustrate this in the figure below with a point-wise vector product x .* y

SIMD AD for point-wise vector product \\label{fig:simd}

This natural way of computing the compressed Jacobian yields a very high performing code that is portable to any vector architecture, given that a similar package like CUDA.jl exists. We note that similar packages for the Intel Compute Engine and AMD ROCm are in development called oneAPI.jl and AMDGPU.jl, respectively. We expect our package to be portable to AMD and Intel GPUs in the future.

diff --git a/dev/man/benchmark/index.html b/dev/man/benchmark/index.html index 819af84d..0674c0c9 100644 --- a/dev/man/benchmark/index.html +++ b/dev/man/benchmark/index.html @@ -2,4 +2,4 @@ Benchmark · ExaPF.jl

Benchmark

For the purpose of performance regression testing, ExaPF provides a lightweight benchmark script. It allows to test the various configurations for the linear solvers used in the Newton-Raphson algorithm, and run them on a specific hardware. The main julia script benchmark/benchmarks.jl takes all its options from the command line. The benchmark script takes as input a linear solver (e.g. Bicgstab), a target architecture as a KernelAbstractions object (CPU or CUDABackend), and a case filename which is included in the ExaData artifact. An exhaustive list of all available linear solvers can be obtained via ExaPF.LinearSolvers.list_solvers.

Running

julia --project benchmark/benchmarks.jl Bicgstab CUDABackend case300.m

yields

Bicgstab, CUDABackend, case300.m,  69.0,  3.57,  43.7, true

The first three fields are the settings of the benchmark run. They are followed by three timings in milliseconds:

  1. the time taken by the Newton-Raphson algorithm to solve the power flow,
  2. the timings for the Jacobian accumulation using AutoDiff,
  3. and the time for the linear solver, including the preconditioner.

To acquire these timings the code is run three times to avoid any precompilation effects. The last field confirms the Newton-Raphson convergence. In case more verbose output is desired, one has to manually set the verbosity in benchmark/benchmarks.jl by changing

powerflow_solver = NewtonRaphson(tol=ntol)

to one of the following options:

powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_NONE)
 powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_LOW)
 powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_MEDIUM)
-powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_HIGH)

A shell script benchmark/benchmarks.sh is provided to gather timings with various canonical configurations and storing them in a file cpu_REV.log and gpu_REF.log, where REV is the sha1 hash of the current checked out ExaPF version.

+powerflow_solver = NewtonRaphson(tol=ntol, verbose=VERBOSE_LEVEL_HIGH)

A shell script benchmark/benchmarks.sh is provided to gather timings with various canonical configurations and storing them in a file cpu_REV.log and gpu_REF.log, where REV is the sha1 hash of the current checked out ExaPF version.

diff --git a/dev/man/formulations/index.html b/dev/man/formulations/index.html index 97b0134f..2c5fe337 100644 --- a/dev/man/formulations/index.html +++ b/dev/man/formulations/index.html @@ -173,4 +173,4 @@ 0.0 0.02340899999999987 0.007743999999999858 - + diff --git a/dev/man/linearsolver/index.html b/dev/man/linearsolver/index.html index 8ce24b69..976e8acf 100644 --- a/dev/man/linearsolver/index.html +++ b/dev/man/linearsolver/index.html @@ -1,3 +1,3 @@ Linear Solvers · ExaPF.jl

Linear Solver

Overview

As mentioned before, a linear solver is required to compute the Newton step in

dx .= jacobian(x)\f(x)

Our package supports the following linear solvers:

  • cuSOLVER with csrlsvqr (GPU),
  • Krylov.jl with dqgmres and bicgstab (CPU/GPU),
  • UMFPACK through the default Julia \ operator (CPU),
  • generic BiCGSTAB implementation [Vorst1992] (CPU/GPU),
  • or any linear solver wrapped in LinearAlgebra.

Preconditioning

Using only an iterative solver leads to divergence and bad performance due to ill-conditioning of the Jacobian. This is a known phenomenon in power systems. That's why this package comes with a block Jacobi preconditioner that is tailored towards GPUs and is proven to work well with power flow problems.

The block-Jacobi preconditioner used in ExaPF has been added to KrylovPreconditioners.jl

The Jacobian is partitioned into a dense block diagonal structure using Metis.jl, where each block is inverted to build our preconditioner P.

Dense block Jacobi preconditioner \\label{fig:preconditioner}

Compared to incomplete Cholesky and incomplete LU this preconditioner is easily portable to the GPU if the number of blocks is high enough. ExaPF.jl uses the batch BLAS calls from cuBLAS to invert the single blocks.

CUDA.@sync pivot, info = CUDA.CUBLAS.getrf_batched!(blocks, true)
-CUDA.@sync pivot, info, p.cuJs = CUDA.CUBLAS.getri_batched(blocks, pivot)

Assuming that other vendors will provide such batched BLAS APIs, this code is portable to other GPU architectures.

  • Vorst1992Vorst, H. A. van der. 1992. “Bi-Cgstab: A Fast and Smoothly Converging Variant of Bi-Cg for the Solution of Nonsymmetric Linear Systems.”SIAM Journal on Scientific and Statistical Computing 13 (2): 631–44
+CUDA.@sync pivot, info, p.cuJs = CUDA.CUBLAS.getri_batched(blocks, pivot)

Assuming that other vendors will provide such batched BLAS APIs, this code is portable to other GPU architectures.

diff --git a/dev/man/powersystem/index.html b/dev/man/powersystem/index.html index 6205c27d..a143512d 100644 --- a/dev/man/powersystem/index.html +++ b/dev/man/powersystem/index.html @@ -16,4 +16,4 @@ 2 julia> PS.get(ps, PS.NumberOfSlackBuses()) -1 +1 diff --git a/dev/quickstart/index.html b/dev/quickstart/index.html index d2b0d3c2..2aada7cd 100644 --- a/dev/quickstart/index.html +++ b/dev/quickstart/index.html @@ -18,11 +18,11 @@ #it 5: 1.57368e-11 Power flow has converged: true * #iterations: 5 - * Time Jacobian (s) ........: 0.3820 - * Time linear solver (s) ...: 0.0018 + * Time Jacobian (s) ........: 0.3815 + * Time linear solver (s) ...: 0.0017 * update (s) ............: 0.0015 * ldiv (s) ..............: 0.0002 - * Time total (s) ...........: 0.7674

Implicitly, ExaPF has just proceed to the following operations:

This compact syntax allows to solve quickly any powerflow equations in a few lines a code. However, in most case, the user may want more coarse grained control on the different objects manipulated.

Detailed version

In what follows, we detail step by step the detailed procedure to solve the powerflow equations.

How to load a MATPOWER instance as a PowerNetwork object?

We start by importing a MATPOWER instance to a ExaPF.PowerSystem.PowerNetwork object:

julia> pf = PS.PowerNetwork(datafile)PowerNetwork object with:
+  * Time total (s) ...........: 0.7664

Implicitly, ExaPF has just proceed to the following operations:

This compact syntax allows to solve quickly any powerflow equations in a few lines a code. However, in most case, the user may want more coarse grained control on the different objects manipulated.

Detailed version

In what follows, we detail step by step the detailed procedure to solve the powerflow equations.

How to load a MATPOWER instance as a PowerNetwork object?

We start by importing a MATPOWER instance to a ExaPF.PowerSystem.PowerNetwork object:

julia> pf = PS.PowerNetwork(datafile)PowerNetwork object with:
     Buses: 1354 (Slack: 1. PV: 259. PQ: 1094)
     Generators: 260.

The different fields of the object pf specify the characteristics of the network. For instance, we can retrieve the number of buses or get the indexing of the PV buses with

julia> nbus = PS.get(pf, PS.NumberOfBuses())1354
julia> pv_indexes = pf.pv;

However, a ExaPF.PowerSystem.PowerNetwork object stores only the physical attributes of the network. To choose a given mathematical formulation, we need to pass the object pf to an ExaPF.AbstractFormulation layer. Currently, only the polar formulation is provided with the ExaPF.PolarForm structure. In the future, other formulations (e.g. RectangularForm) may be implemented as well.

How to solve the powerflow equations?

To solve the powerflow equations, we need to choose a given mathematical formulation for the equations of the network. To each formulation corresponds a given state $x$ and control $u$. Using polar representation of the voltage vector, such as $\bm{v} = |v|e^{j \theta}$, each bus $i=1, \cdots, N_B$ must satisfy the power balance equations:

\[\begin{aligned} p_i &= v_i \sum_{j}^{n} v_j (g_{ij}\cos{(\theta_i - \theta_j)} + b_{ij}\sin{(\theta_i - \theta_j})) \,, \\ @@ -63,11 +63,11 @@ #it 5: 1.57368e-11 Power flow has converged: true * #iterations: 5 - * Time Jacobian (s) ........: 0.0044 + * Time Jacobian (s) ........: 0.0047 * Time linear solver (s) ...: 0.0017 * update (s) ............: 0.0015 * ldiv (s) ..............: 0.0002 - * Time total (s) ...........: 0.0065

Here, the algorithm solves the powerflow equations in 5 iterations. The algorithm modifies the values of stack inplace, to avoid any unnecessary memory allocations.

How to deport the computation on the GPU?

Now, how can we deport the resolution on the GPU? The procedure looks exactly the same. It suffices to initiate a new ExaPF.PolarForm object, but on the GPU:

julia> polar_gpu = ExaPF.PolarForm(pf, CUDABackend())Polar formulation (instantiated on device CUDA.CUDAKernels.CUDABackend(false, false))
+  * Time total (s) ...........: 0.0067

Here, the algorithm solves the powerflow equations in 5 iterations. The algorithm modifies the values of stack inplace, to avoid any unnecessary memory allocations.

How to deport the computation on the GPU?

Now, how can we deport the resolution on the GPU? The procedure looks exactly the same. It suffices to initiate a new ExaPF.PolarForm object, but on the GPU:

julia> polar_gpu = ExaPF.PolarForm(pf, CUDABackend())Polar formulation (instantiated on device CUDA.CUDAKernels.CUDABackend(false, false))
 Network characteristics:
     #buses:      1354  (#slack: 1  #PV: 259  #PQ: 1094)
     #generators: 260
@@ -83,11 +83,11 @@
 #it 5: 9.96622e-12
 Power flow has converged: true
   * #iterations: 5
-  * Time Jacobian (s) ........: 2.3092
-  * Time linear solver (s) ...: 0.2299
+  * Time Jacobian (s) ........: 2.3125
+  * Time linear solver (s) ...: 0.2204
      * update (s) ............: 0.0000
-     * ldiv (s) ..............: 0.2299
-  * Time total (s) ...........: 3.3925

Note that we get exactly the same iterations as when we solve the power flow equations on the CPU.

How to solve the linear system with BICGSTAB?

By default, the algorithm runs with a direct solver, which might be inefficient for large problems. To overcome this issue, ExaPF implements a wrapper for different iterative algorithms (GMRES, BICGSTAB).

The performance of iterative solvers is usually improved if we use a preconditioner. ExaPF.jl implements an overlapping Schwarz preconditioner, tailored for GPU usage. To build an instance with 8 blocks, just write

julia> npartitions = 8;
julia> jac_gpu = jx_gpu.J;
julia> precond = BlockJacobiPreconditioner(jac_gpu, npartitions, CUDABackend());ERROR: UndefVarError: `BlockJacobiPreconditioner` not defined

You can attach the preconditioner to an BICGSTAB algorithm simply as

julia> linear_solver = ExaPF.Bicgstab(jac_gpu; P=precond);ERROR: UndefVarError: `precond` not defined

(this will use the BICGSTAB algorithm implemented in Krylov.jl).

We need to update accordingly the tolerance of the Newton-Raphson algorithm (the iterative solver is less accurate than the direct solver):

julia> pf_algo = NewtonRaphson(; verbose=1, tol=1e-7)NewtonRaphson(20, 1.0e-7, 1)

We reset the variables to their initial values:

julia> ExaPF.init!(polar_gpu, stack_gpu)

Then, solving the power flow with the iterative solvers directly translates to one call to nlsolve!:

julia> convergence = ExaPF.nlsolve!(pf_algo, jx_gpu, stack_gpu; linear_solver=linear_solver)#it 0: 1.15103e+02
+     * ldiv (s) ..............: 0.2204
+  * Time total (s) ...........: 3.3274

Note that we get exactly the same iterations as when we solve the power flow equations on the CPU.

How to solve the linear system with BICGSTAB?

By default, the algorithm runs with a direct solver, which might be inefficient for large problems. To overcome this issue, ExaPF implements a wrapper for different iterative algorithms (GMRES, BICGSTAB).

The performance of iterative solvers is usually improved if we use a preconditioner. ExaPF.jl implements an overlapping Schwarz preconditioner, tailored for GPU usage. To build an instance with 8 blocks, just write

julia> npartitions = 8;
julia> jac_gpu = jx_gpu.J;
julia> precond = BlockJacobiPreconditioner(jac_gpu, npartitions, CUDABackend());ERROR: UndefVarError: `BlockJacobiPreconditioner` not defined

You can attach the preconditioner to an BICGSTAB algorithm simply as

julia> linear_solver = ExaPF.Bicgstab(jac_gpu; P=precond);ERROR: UndefVarError: `precond` not defined

(this will use the BICGSTAB algorithm implemented in Krylov.jl).

We need to update accordingly the tolerance of the Newton-Raphson algorithm (the iterative solver is less accurate than the direct solver):

julia> pf_algo = NewtonRaphson(; verbose=1, tol=1e-7)NewtonRaphson(20, 1.0e-7, 1)

We reset the variables to their initial values:

julia> ExaPF.init!(polar_gpu, stack_gpu)

Then, solving the power flow with the iterative solvers directly translates to one call to nlsolve!:

julia> convergence = ExaPF.nlsolve!(pf_algo, jx_gpu, stack_gpu; linear_solver=linear_solver)#it 0: 1.15103e+02
 #it 1: 1.50328e+01
 #it 2: 5.88242e-01
 #it 3: 4.88493e-03
@@ -96,7 +96,7 @@
 Power flow has converged: true
   * #iterations: 5
   * Time Jacobian (s) ........: 0.0025
-  * Time linear solver (s) ...: 0.1149
+  * Time linear solver (s) ...: 0.1150
      * update (s) ............: 0.0000
-     * ldiv (s) ..............: 0.1149
-  * Time total (s) ...........: 0.1188
+ * ldiv (s) ..............: 0.1150 + * Time total (s) ...........: 0.1190 diff --git a/dev/tutorials/batch_evaluation/index.html b/dev/tutorials/batch_evaluation/index.html index 6e5ba6cd..99ebc984 100644 --- a/dev/tutorials/batch_evaluation/index.html +++ b/dev/tutorials/batch_evaluation/index.html @@ -21,26 +21,26 @@ #states : 14

Then, ExaPF can also instantiate a NetworkStack object, with the memory required to store the variables of the different scenarios:

blk_stack = ExaPF.NetworkStack(blk_polar)
210-elements NetworkStack{Vector{Float64}}

We can pass the scenarios manually using the function set_params!:

ploads = rand(nbus, nscen);
 qloads = rand(nbus, nscen);
 ExaPF.set_params!(blk_stack, ploads, qloads)
90-element Vector{Float64}:
- 0.14906570380091866
- 0.916078672443881
- 0.3452067732900478
- 0.5122402116722938
- 0.04797847888600548
- 0.5552597863277124
- 0.7420975289636497
- 0.3550323728359359
- 0.3090731547242763
- 0.16368547344861284
+ 0.06115666991576285
+ 0.502065528891404
+ 0.3339408813040945
+ 0.7219913674880951
+ 0.12445420941498209
+ 0.26918201755040105
+ 0.4172409129974677
+ 0.9209864384929705
+ 0.47880557731939366
+ 0.8348758446128132
  ⋮
- 0.8426919541630601
- 0.6608431104725754
- 0.7283907353617278
- 0.05751973655546372
- 0.5327766781164297
- 0.8616534046977098
- 0.844986963635221
- 0.0291422141189297
- 0.5533440805962

The structure blk_stack stores $N$ different realizations for the variables stored in the field input (vmag, vang and pgen). By default, the initial values are set according to the values specified in blk_polar (usually defined when importing the data from the instance file):

reshape(blk_stack.vmag, nbus, nscen)
9×10 Matrix{Float64}:
+ 0.9844509029095023
+ 0.7533158758178876
+ 0.7596363313900393
+ 0.6161447011959152
+ 0.8492899243561641
+ 0.01886692161232595
+ 0.7606774984644377
+ 0.11118542727504122
+ 0.19517326050805472

The structure blk_stack stores $N$ different realizations for the variables stored in the field input (vmag, vang and pgen). By default, the initial values are set according to the values specified in blk_polar (usually defined when importing the data from the instance file):

reshape(blk_stack.vmag, nbus, nscen)
9×10 Matrix{Float64}:
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
@@ -50,32 +50,32 @@
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

Only the parameters are varying according to the scenarios we passed as input in the constructor:

reshape(blk_stack.pload, nbus, nscen)
9×10 Matrix{Float64}:
- 0.171561   0.613955   0.0626431  …  0.806531   0.7        0.212864
- 0.967174   0.50392    0.849147      0.0752495  0.312361   0.229171
- 0.392074   0.83311    0.196928      0.849684   0.234693   0.799418
- 0.410796   0.741859   0.731973      0.862394   0.292173   0.0418581
- 0.482179   0.0588294  0.061672      0.0310203  0.488948   0.0640871
- 0.185275   0.237144   0.440699   …  0.235886   0.575537   0.239417
- 0.222044   0.903887   0.865432      0.474783   0.307437   0.963479
- 0.209467   0.482393   0.136672      0.350595   0.0585677  0.342826
- 0.0829163  0.23021    0.755033      0.756468   0.384565   0.00963085

Evaluate expressions in block

ExaPF takes advantage of the block structure when using a BlockPolarForm.

As an example, suppose we want to evaluate the power flow balances in block form with a PowerFlowBalance expression:

powerflow = ExaPF.PowerFlowBalance(blk_polar) ∘ ExaPF.PolarBasis(blk_polar);
ExaPF.ComposedExpressions{ExaPF.PolarBasis{Vector{Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}, ExaPF.PowerFlowBalance{Vector{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}}(PolarBasis (AbstractExpression), PowerFlowBalance (AbstractExpression))

A block evaluation takes as input the NetworkStack blk_stack structure:

m = div(length(powerflow), nscen);
+ 0.819993   0.83336    0.801638  …  0.981554   0.230333    0.816545
+ 0.458523   0.751841   0.582271     0.977505   0.711284    0.740138
+ 0.848787   0.0603864  0.260817     0.251241   0.326592    0.889074
+ 0.0577224  0.543626   0.871812     0.221338   0.598004    0.224391
+ 0.818846   0.48596    0.14153      0.381251   0.18589     0.17312
+ 0.130555   0.662285   0.386015  …  0.0646367  0.494029    0.864839
+ 0.356007   0.348216   0.337405     0.413113   0.684744    0.338236
+ 0.637388   0.397926   0.445512     0.431177   0.676264    0.564115
+ 0.317828   0.51729    0.089645     0.888848   0.00281554  0.276826

Evaluate expressions in block

ExaPF takes advantage of the block structure when using a BlockPolarForm.

As an example, suppose we want to evaluate the power flow balances in block form with a PowerFlowBalance expression:

powerflow = ExaPF.PowerFlowBalance(blk_polar) ∘ ExaPF.PolarBasis(blk_polar);
ExaPF.ComposedExpressions{ExaPF.PolarBasis{Vector{Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}, ExaPF.PowerFlowBalance{Vector{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}}}(PolarBasis (AbstractExpression), PowerFlowBalance (AbstractExpression))

A block evaluation takes as input the NetworkStack blk_stack structure:

m = div(length(powerflow), nscen);
 blk_output = zeros(m * nscen);
 powerflow(blk_output, blk_stack); # inplace evaluation
 reshape(blk_output, m, nscen)
14×10 Matrix{Float64}:
- -0.662826   -1.12608    -0.780853  -0.894575   …  -1.31764    -1.40083
- -0.457926   -0.0168903  -0.653072  -0.102736      -0.615307   -0.050582
-  0.410796    0.741859    0.731973   0.0478081      0.292173    0.0418581
-  0.482179    0.0588294   0.061672   0.74038        0.488948    0.0640871
-  0.185275    0.237144    0.440699   0.0103921      0.575537    0.239417
-  0.222044    0.903887    0.865432   0.545402   …   0.307437    0.963479
-  0.209467    0.482393    0.136672   0.92424        0.0585677   0.342826
-  0.0829163   0.23021     0.755033   0.473805       0.384565    0.00963085
-  0.34524     0.597464    0.58964    0.527898       0.160513   -0.10948
- -0.210022   -0.093281    0.220613   0.453514       0.540625    0.274777
-  0.27176     0.0532537   0.64156    0.136159   …  -0.129377    0.578153
-  0.563098    0.332741    0.391244   0.0309061      0.781151    0.665987
-  0.127532    0.217216    0.271194   0.212826       0.309269   -0.198358
-  0.0680732   0.234133    0.152078   0.130422       0.505881    0.312344

We get $N$ different results for the power flow balance equations, depending on which scenario we are on.

Solve power flow in block on the CPU

Once the different structures used for block evaluation instantiated, one is able to solve the power flow in block on the CPU using the same function nlsolve!. The block Jacobian is evaluated with automatic differentiation using a ArrowheadJacobian structure:

blk_jx = ExaPF.ArrowheadJacobian(blk_polar, powerflow, State());
+ -1.17148     -0.878159  -1.04773    …  -0.652495   -0.918716    -0.889862
+ -0.00121318  -0.789614  -0.589183      -0.598759   -0.523408     0.039074
+  0.0577224    0.543626   0.871812       0.221338    0.598004     0.224391
+  0.818846     0.48596    0.14153        0.381251    0.18589      0.17312
+  0.130555     0.662285   0.386015       0.0646367   0.494029     0.864839
+  0.356007     0.348216   0.337405   …   0.413113    0.684744     0.338236
+  0.637388     0.397926   0.445512       0.431177    0.676264     0.564115
+  0.317828     0.51729    0.089645       0.888848    0.00281554   0.276826
+  0.554991     0.400953   0.758048       0.0846076   0.594388     0.449145
+ -0.133546     0.523903   0.0466533      0.555023    0.729903     0.59129
+ -0.014318    -0.129496   0.655417   …   0.641513    0.180036    -0.264633
+  0.238241     0.255567   0.78968        0.635082    0.291005     0.581677
+  0.693486    -0.170814   0.679487       0.645       0.486525    -0.116315
+  0.237806     0.519285   0.231406      -0.0233186  -0.0684419   -0.0458267

We get $N$ different results for the power flow balance equations, depending on which scenario we are on.

Solve power flow in block on the CPU

Once the different structures used for block evaluation instantiated, one is able to solve the power flow in block on the CPU using the same function nlsolve!. The block Jacobian is evaluated with automatic differentiation using a ArrowheadJacobian structure:

blk_jx = ExaPF.ArrowheadJacobian(blk_polar, powerflow, State());
 blk_jx.J
140×140 SparseArrays.SparseMatrixCSC{Float64, Int64} with 820 stored entries:
 ⎡⡱⣮⡲⣞⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
 ⎢⣸⢮⣻⣾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
@@ -132,12 +132,12 @@
  1.0       1.0       1.0       1.0       …  1.0       1.0       1.0
  1.0       1.0       1.0       1.0          1.0       1.0       1.0
  1.0       1.0       1.0       1.0          1.0       1.0       1.0
- 0.976321  0.959984  0.942147  0.94504      0.956357  0.935348  0.973855
- 0.980165  0.968362  0.924375  0.908177     0.924582  0.895753  0.943732
- 0.972023  0.9734    0.94075   0.966424  …  0.957215  0.956603  0.944316
- 0.943823  0.945793  0.921951  0.961013     0.953908  0.905278  0.918854
- 0.970245  0.96276   0.950407  0.965811     0.96627   0.938024  0.966609
- 0.967524  0.944186  0.92673   0.938454     0.928087  0.893468  0.951263

Solve power flow in batch on the GPU

When the BlockPolarForm model is instantiated on the GPU, the expressions are evaluated in batch. The syntax to solve the power flow equations is exactly the same as on the CPU, using cusolverRF to solve the different linear systems:

using CUDA, CUSOLVERRF
+ 0.958105  0.940054  0.927832  0.97383      0.952648  0.937812  0.954586
+ 0.959723  0.908805  0.919622  0.977451     0.899347  0.887168  0.912083
+ 0.975837  0.976274  0.927506  0.980944  …  0.930225  0.951303  0.966808
+ 0.945303  0.962773  0.879709  0.958978     0.895597  0.931095  0.940025
+ 0.947233  0.976549  0.917059  0.980425     0.929261  0.948081  0.971816
+ 0.935884  0.913759  0.904618  0.946843     0.935486  0.944465  0.960046

Solve power flow in batch on the GPU

When the BlockPolarForm model is instantiated on the GPU, the expressions are evaluated in batch. The syntax to solve the power flow equations is exactly the same as on the CPU, using cusolverRF to solve the different linear systems:

using CUDA, CUSOLVERRF
 polar_gpu = ExaPF.load_polar("case9.m", CUDABackend());
 blk_polar_gpu = ExaPF.BlockPolarForm(polar_gpu, nscen); # load model on GPU
 blk_stack_gpu = ExaPF.NetworkStack(blk_polar_gpu);
@@ -156,8 +156,8 @@
 )
Power flow has converged: true
   * #iterations: 4
   * Time Jacobian (s) ........: 0.0013
-  * Time linear solver (s) ...: 0.0480
+  * Time linear solver (s) ...: 0.0482
      * update (s) ............: 0.0241
-     * ldiv (s) ..............: 0.0240
+     * ldiv (s) ..............: 0.0241
   * Time total (s) ...........: 0.0513
-
+ diff --git a/dev/tutorials/direct_solver/index.html b/dev/tutorials/direct_solver/index.html index 7408e498..95628eab 100644 --- a/dev/tutorials/direct_solver/index.html +++ b/dev/tutorials/direct_solver/index.html @@ -15,20 +15,20 @@ jx = ExaPF.Jacobian(polar, func, State()) # init AD ExaPF.nlsolve!(pf_solver, jx, stack)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 1.1296
-  * Time linear solver (s) ...: 0.0335
-     * update (s) ............: 0.0292
-     * ldiv (s) ..............: 0.0043
-  * Time total (s) ...........: 1.1691
+  * Time Jacobian (s) ........: 1.1326
+  * Time linear solver (s) ...: 0.0340
+     * update (s) ............: 0.0296
+     * ldiv (s) ..............: 0.0044
+  * Time total (s) ...........: 1.1725
 

KLU (CPU)

KLU is an efficient sparse linear solver, initially designed for circuit simulation problems. It is often considered as one of the state-of-the-art linear solver to solve power flow problems. Conveniently, KLU is wrapped in Julia with the package KLU.jl. KLU.jl implements a proper interface to use KLU. We just have to implement a forgiving function for LinearAlgebra.lu!

LinearAlgebra.lu!(K::KLU.KLUFactorization, J) = KLU.klu!(K, J)

Then, we are ready to solve a power flow with KLU using our current abstraction. One has just to create a new instance of LS.DirectSolver:

klu_factorization = KLU.klu(jx.J)
-klu_solver = LS.DirectSolver(klu_factorization)
ExaPF.LinearSolvers.DirectSolver{KLU.KLUFactorization{Float64, Int64}}(KLU.KLUFactorization{Float64, Int64}(KLU.LibKLU.klu_l_common_struct(0.001, 1.2, 1.2, 10.0, 0.0, 1, 0, 2, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000, 1, 0, 0, 17036, 17036, 17036, 0, -1.0, -1.0, -1.0, -1.0, 0.0, 0x00000000005a0030, 0x0000000000760150), Ptr{Nothing} @0x0000000012db70c0, Ptr{Nothing} @0x0000000018169df0, 17036, [0, 4, 12, 19, 24, 27, 37, 40, 44, 54  …  129350, 129359, 129363, 129369, 129380, 129384, 129390, 129402, 129408, 129412], [0, 383, 5548, 13344, 1, 458, 4027, 5306, 6117, 11823  …  6009, 6175, 9238, 13805, 13971, 17034, 8879, 9239, 16675, 17035], [1050.4551624894714, -629.5373899630948, -421.251258755022, 88.40062041224643, 995.2128749049238, -479.4136240982223, -138.29482949968133, -145.95963146384614, -230.91975549270308, 22.972728768572996  …  -480.47843944268664, -39.71137052069115, 520.1642960016878, -3360.900457400674, -208.11090120000475, 3568.982258670069, -39.40403545216686, 39.40522422897141, -242.8021538639664, 242.80196093598863]))

and pass it to nlsolve!:

ExaPF.init!(polar, stack) # reinit stack
+klu_solver = LS.DirectSolver(klu_factorization)
ExaPF.LinearSolvers.DirectSolver{KLU.KLUFactorization{Float64, Int64}}(KLU.KLUFactorization{Float64, Int64}(KLU.LibKLU.klu_l_common_struct(0.001, 1.2, 1.2, 10.0, 0.0, 1, 0, 2, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000, 1, 0, 0, 17036, 17036, 17036, 0, -1.0, -1.0, -1.0, -1.0, 0.0, 0x00000000005a0030, 0x0000000000760150), Ptr{Nothing} @0x0000000010dc8320, Ptr{Nothing} @0x0000000011e2b3a0, 17036, [0, 4, 12, 19, 24, 27, 37, 40, 44, 54  …  129350, 129359, 129363, 129369, 129380, 129384, 129390, 129402, 129408, 129412], [0, 383, 5548, 13344, 1, 458, 4027, 5306, 6117, 11823  …  6009, 6175, 9238, 13805, 13971, 17034, 8879, 9239, 16675, 17035], [1050.4551624894714, -629.5373899630948, -421.251258755022, 88.40062041224643, 995.2128749049238, -479.4136240982223, -138.29482949968133, -145.95963146384614, -230.91975549270308, 22.972728768572996  …  -480.47843944268664, -39.71137052069115, 520.1642960016878, -3360.900457400674, -208.11090120000475, 3568.982258670069, -39.40403545216686, 39.40522422897141, -242.8021538639664, 242.80196093598863]))

and pass it to nlsolve!:

ExaPF.init!(polar, stack) # reinit stack
 ExaPF.nlsolve!(pf_solver, jx, stack; linear_solver=klu_solver)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 0.2740
-  * Time linear solver (s) ...: 0.0333
-     * update (s) ............: 0.0291
-     * ldiv (s) ..............: 0.0042
-  * Time total (s) ...........: 0.3132
+  * Time Jacobian (s) ........: 0.2761
+  * Time linear solver (s) ...: 0.0336
+     * update (s) ............: 0.0293
+     * ldiv (s) ..............: 0.0043
+  * Time total (s) ...........: 0.3156
 

We observe KLU reduces considerably the time spent in the linear solver.

cusolverRF (CUDA)

cusolverRF is an efficient LU refactorization routine implemented in CUDA. It is wrapped in Julia inside the package CUSOLVERRF.jl:

using CUSOLVERRF

The principle is the following: the initial symbolic factorization is computed on the CPU with the routine chosen by the user. Then, each time we have to refactorize a matrix with the same sparsity pattern, we can recompute the numerical factorization entirely on the GPU. In practice, this solver is efficient at refactorizing a given matrix if the sparsity is significant.

This is of direct relevance for us, as (i) the sparsity of the power flow Jacobian doesn't change along the Newton iterations and (ii) the Jacobian is super-sparse. In ExaPF, it is the linear solver of choice when it comes to solve the power flow entirely on the GPU.

CUSOLVERRF.jl follows the LinearAlgebra's interface, so we can use it directly in ExaPF. We first have to instantiate everything on the GPU:

using CUDA
 polar_gpu = ExaPF.load_polar("case9241pegase.m", CUDABackend())
 stack_gpu = ExaPF.NetworkStack(polar_gpu)
@@ -38,9 +38,9 @@
 rf_solver = LS.DirectSolver(rf_fac)
ExaPF.LinearSolvers.DirectSolver{CUSOLVERRF.RFLU{Float64, CUSOLVERRF.CuSparseSVDeprecated, CUSOLVERRF.CuSparseSMDeprecated}}(CUSOLVERRF.RFLU{Float64, CUSOLVERRF.CuSparseSVDeprecated, CUSOLVERRF.CuSparseSMDeprecated}
 )

Then, we are able to solve the power flow entirely on the GPU, simply as

ExaPF.nlsolve!(pf_solver, jx_gpu, stack_gpu; linear_solver=rf_solver)
Power flow has converged: true
   * #iterations: 7
-  * Time Jacobian (s) ........: 1.6922
-  * Time linear solver (s) ...: 0.1096
-     * update (s) ............: 0.1080
-     * ldiv (s) ..............: 0.0016
-  * Time total (s) ...........: 1.8047
-
+ * Time Jacobian (s) ........: 1.6491 + * Time linear solver (s) ...: 0.1026 + * update (s) ............: 0.1008 + * ldiv (s) ..............: 0.0018 + * Time total (s) ...........: 1.7546 +