Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin merging production changes #197

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

ludgerpaehler
Copy link
Collaborator

@ludgerpaehler ludgerpaehler commented Oct 27, 2024

No description yet.

TODO:

  • Rewrite commits to actually be associated with Christian Lagemann (the author) @cl126162
  • Resolve merge conflicts

@ludgerpaehler ludgerpaehler self-assigned this Oct 27, 2024
@ludgerpaehler ludgerpaehler added priority High-priority core feature Paper-Release python Pull requests that update Python code labels Oct 27, 2024
Copy link

@ReHoss ReHoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong button... Trying to abandon the "review".

X, Y = np.meshgrid(xp, yp)
DEFAULT_VEL_PROBES = [(x, y) for x, y in zip(X.ravel(), Y.ravel())]

# Pressure probes (spaced equally around the cylinder)
RADIUS = 0.5
Copy link

@ReHoss ReHoss Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RADIUS does not seem to be used anymore. It was used by previous pressure probes definition.

self.initialize_state()

self.reset()

if config.get("restart"):
self.load_checkpoint(config["restart"])
self.load_checkpoint(config["restart"][0])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the checkpoint is an iterable now ?

@@ -391,6 +489,7 @@ def reset(self, t=0.0) -> Union[ArrayLike, Tuple[ArrayLike, dict]]:
self.flow.reset(q0=self.q0, t=t)
self.solver.reset()

# Previously: return self.flow.get_observations(), info
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master still return self.flow.get_observations() only too.
(Linked to the question of the transition to Gymnasium )

@@ -242,7 +242,7 @@ def cyl_velocity_field(self):
theta_lo = -0.5 * pi
A_lo = ufl.conditional(
abs(theta - theta_lo) < omega / 2,
pi / (2 * omega * self.rad**2) * ufl.cos(
-pi / (2 * omega * self.rad**2) * ufl.cos(
Copy link

@ReHoss ReHoss Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical fix that has not been considered in #195 ? @jcallaham

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, did I get this wrong? I think this is separate from the fixes in #195

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jcallaham , if you have a moment to sort this out I would be very grateful since I am writing my PhD thesis and I am about to launch experiments.
Thank you:)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comparing with Appendix B in the referenced paper and it looks to me like the original was correct. @cl126162 @ludgerpaehler can anyone elaborate on this change? I implemented this as zero net mass flux, which is fairly typical of CFD actuation models, though that reference actually implemented as two independent actuators, which is not really great in terms of boundary conditions and physicality. I guess changing the sign would make it so that both jets are in blowing or suction at once, rather than in opposition (as in the ZNMF configuration).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much @jcallaham :)!

@@ -13,6 +13,36 @@


class Pinball(FlowConfig):
rad = 0.5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rad is already defined below, it could be good to pass an unused variables check (shipped with standard static checkers)

@@ -292,6 +298,43 @@ def solve(
PDEBase: The state of the PDE at the end of the solve
"""
for iter, t in enumerate(np.arange(*t_span, self.dt)):
iter = iter + start_iteration_value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like solve and solve_multistep do almost the same thing. For a "public" release It may be better to merge them.:)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - what's the distinction supposed to be here?

assert self.rewardLogCallback is not None,\
f"Error: If num_sim_substeps_per_actuation ({self.num_sim_substeps_per_actuation}) " \
"is set a reward callback function must be given, {self.rewardLogCallback}"
self.reward_aggreation_rule = env_config.get("actuation_config", {}).get(
Copy link

@ReHoss ReHoss Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of the reward_aggregation_rule for a gym interface where only instantaneous rewards/costs are considered? Is it the aggregations of the values in between decision times (times where the user hits step ) ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is a typo in the variable name reward_aggreation_rule)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this all happen in some kind of Wrapper? All of this seems to add a lot of complexity to an otherwise straightforward class.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrappers can deal with those aggregation rules and also action scaling.

MAX_CONTROL = 0.1
MAX_CONTROL_LOW = 0.0
MAX_CONTROL_UP = 0.03
CONTROL_SCALING = 1.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does CONTROL_SCALING now introduces?

Copy link

@ReHoss ReHoss Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer: It is a multiplicating scaling of the input action used in the step method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this used? Why wouldn't you just scale the input action externally to the physics model of the flow?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also appears that the Pinball FlowConfig is missing a CONTROL_SCALING parameter, so it is not possible to step through the pinball environment with the current PR.

t = self.iter * self.solver.dt
action = action * self.flow.CONTROL_SCALING

if self.num_sim_substeps_per_actuation is not None and self.num_sim_substeps_per_actuation > 1:
Copy link

@ReHoss ReHoss Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly num_sim_substeps_per_actuation allows stretching the duration of the actions uniformly (actions given to the damped controller are piecewise constant). For this piecewise action signal, num_sim_substeps_per_actuation adjusts the length of the intervals on which the action is constant?

@@ -11,6 +11,18 @@


class Cavity(FlowConfig):
# Velocity probes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probs are used now and the old setting of A. Barbagallo, D. Sipp, P. J. Schmid - Closed-loop control of an open cavity flow using reduced-order models (2014) is abandoned?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's still supported (as I think it should be), but not the default measurement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, probes can be set optionally and the old setting can still be used.

self.iter += 1
self.t += self.dt
reward = self.get_reward()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the else statement above redundant since num_sim_substeps_per_actuation == 1 should be equivalent? (Not critical, and if things are in "production" it may be dangerous to modify the code too much...)

@@ -200,7 +196,9 @@ def render(self, mode="human", clim=None, levels=None, cmap="RdBu", **kwargs):


class RotaryCylinder(CylinderBase):
MAX_CONTROL = 0.5 * np.pi
MAX_CONTROL_LOW = -1.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradicts MAX_CONTROL recent update from #195

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah is this based on anything in particular? If not, having it just match the pinball would probably make sense.

else:
CL, CD = averaged_objective_values
# return np.square(CD) + lambda_value * np.square(CL)
return np.abs(CD) + lambda_value * np.abs(CL)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could give details/references on this new loss at some point?

@ReHoss
Copy link

ReHoss commented Oct 27, 2024

A brief summary (correct me if I am wrong)

Hydrogym update

  • Update the environments to accept a mean/average reward (evaluate_objective) when stretching the action signal (there is now a decision time vs. a physical time)
  • Implements a series of default probes now
  • Is reward aggregation now mandatory (otherwise an exception is raised) (Could it be defaulted to average?)
Cavity
  • The control bounds are updated
  • Probes are added to the cavity
  • Nothing removed, evaluate_objective extended to support aggregating objective values
CylinderBase
  • Move and reduce the number of probes
  • Add vorticity probes
  • Cylinder: sign fix for A_lo
  • RotaryCylinder: change bounds MAX_CONTROL (different from Jared's Fix BCs and geometry for pinball #195); change CONTROL_SCALING
  • New objective function that combines CD and CL with a penalty
Pinball
  • Add probes (no probes before)

Code review

  • Unused variables check
  • Type hinting?
  • solve and solve_multiple... seems redundant

"is set a reward callback function must be given, {self.rewardLogCallback}"
self.reward_aggreation_rule = env_config.get("actuation_config", {}).get(
"reward_aggreation_rule", None)
assert self.reward_aggreation_rule in [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reward aggregation rule is implemented and mandatory now, a default could possibly be set for backward compat?

Copy link
Collaborator

@jcallaham jcallaham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good stuff , but I think we'll have to do some cleanup before it's really release-ready. Let me know if I can help with any of this. Are these basically the changes that made the baseline RL training happen?

High-level comments:

  1. Timesteppers don't use the actual time directly - that's only part of the ODE RHS, so in my opinion it makes more sense to continue having t be owned by flow. If needed for callbacks or controllers or whatever it can always be accessed by flow.t.
  2. The reward aggregation stuff looks useful, but the source code is fairly confusing. I would say it should either be cleaned up and clearly documented or (preferably) moved into a gym.Wrapper unless this is typical as part of a gym.Env implementation.
  3. I really like the idea of having the 1D PDEs as cheap environments, but these implementations are completely independent of everything else in Hydrogym, meaning they don't follow the PDEBase/TransientSolver/FlowEnv design. If the goal is a "production" release, I think it might be better to hold off on these and then later implement them as part of a full "torch" backend or something. Otherwise it seems like these could just be in a totally separate repo.
  4. Generally my opinion is that the FlowConfig should be a physics model of the flow configuration and should not have anything related to reinforcement learning. That makes it so that the package could also be used for more traditional fluid dynamics and controls research. It should be possible to put all of that stuff either in some kind of callback or at the Env level.

@@ -272,12 +276,14 @@ def __init__(self, flow: PDEBase, dt: float = None):
if dt is None:
dt = flow.DEFAULT_DT
self.dt = dt
self.t = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the time is owned by the flow, not the solver. I guess you could do it the other way around, but since the "state" (velocity, pressure, etc) is owned by the flow class I figured it made more sense to have time there as well. Then it is advanced with flow.advance_time(dt) here. On the other hand, the solver doesn't directly depend on time, only the timestep, so my feeling is that it doesn't naturally belong there.

Actually I think the only place it's needed is for the "callbacks" or "controller", but now that time is a property of the flow it doesn't need to be there at all (or the callback could be passed flow.t instead of calculating it within the loop).

Copy link

@ReHoss ReHoss Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I agree, a solver object can be viewed as a mapping (implemented by a solve method) taking as input the current time $t$ (for non-autonomous systems), the current state $x_t$ and a time duration $T$ to return the solution $x_{t+T}$ $T$ times later. (this mapping is called a flow in maths). But the mapping in itself is fixed and independent of the time! Hence, initialising it with self.t = 0.0 may not be really needed.

  • Practically, I don't see any use of self.t in this class, so it seems safe to remove it?

@@ -292,6 +298,43 @@ def solve(
PDEBase: The state of the PDE at the end of the solve
"""
for iter, t in enumerate(np.arange(*t_span, self.dt)):
iter = iter + start_iteration_value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - what's the distinction supposed to be here?

@@ -317,20 +360,25 @@ def step(self, iter: int, control: Iterable[float] = None, **kwargs):

def reset(self):
"""Reset variables for the timestepper"""
pass
self.t = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - time should be owned by the flow, and the reset method for the flow can also reset the time.

self.solver: TransientSolver = env_config.get("solver")(
self.flow, **env_config.get("solver_config", {}))
self.callbacks: Iterable[CallbackBase] = env_config.get("callbacks", [])
self.rewardLogCallback: Iterable[CallbackBase] = env_config.get(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use snake case for consistent style

# if ((iter + 1) * self.dt * 10e6) % (self.DT * 10e6) == 0:
if (iter + 1) % self.num_sim_substeps_per_actuation == 0:
self.data = np.array(self.data)
flow.reward_array = self.data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to do this? The flow is a physics model and so RL-specific stuff should be abstracted

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically transforming this class in a gym.Wrapper

q=None,
qB=None,
averaged_objective_values=None,
return_objective_values=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments on evaluate_objective changes for the cylinder

@@ -345,14 +355,20 @@ def velocity_probe(self, probes, q: fd.Function = None) -> list[float]:
if q is None:
q = self.q
u = q.subfunctions[0]
return np.stack(u.at(probes)).flatten("F")

# probe_values = np.stack([u.at(probe) for probe in probes]).flatten("F")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of commented out code here (and throughout the PR)

@@ -47,12 +49,13 @@ class PDEBase(metaclass=abc.ABCMeta):

def __init__(self, **config):
self.mesh = self.load_mesh(name=config.get("mesh", self.DEFAULT_MESH))
self.reward_lambda = config.get("reward_lambda", 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used for anything. Also not related to the physics modeled by the PDEBase.

MAX_CONTROL = 0.1
MAX_CONTROL_LOW = 0.0
MAX_CONTROL_UP = 0.03
CONTROL_SCALING = 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this used? Why wouldn't you just scale the input action externally to the physics model of the flow?

@@ -70,13 +70,21 @@ seaborn = "^0.12.1"
[tool.yapf]
based_on_style = "yapf"

[tool.yapfignore]
ignore_patterns = [
"hydrogym/experimental/*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this get a pass on formatting?

@ReHoss
Copy link

ReHoss commented Oct 31, 2024

Lots of good stuff , but I think we'll have to do some cleanup before it's really release-ready. Let me know if I can help with any of this. Are these basically the changes that made the baseline RL training happen?

High-level comments:

  1. Timesteppers don't use the actual time directly - that's only part of the ODE RHS, so in my opinion it makes more sense to continue having t be owned by flow. If needed for callbacks or controllers or whatever it can always be accessed by flow.t.

I agree.

  1. The reward aggregation stuff looks useful, but the source code is fairly confusing. I would say it should either be cleaned up and clearly documented or (preferably) moved into a gym.Wrapper unless this is typical as part of a gym.Env implementation.

I think a gym.wrapper can do the job.

  1. I really like the idea of having the 1D PDEs as cheap environments, but these implementations are completely independent of everything else in Hydrogym, meaning they don't follow the PDEBase/TransientSolver/FlowEnv design. If the goal is a "production" release, I think it might be better to hold off on these and then later implement them as part of a full "torch" backend or something. Otherwise, it seems like these could just be in a totally separate repo.

I agree!

  1. Generally, my opinion is that the FlowConfig should be a physics model of the flow configuration and should not have anything related to reinforcement learning. That makes it so that the package could also be used for more traditional fluid dynamics and controls research. It should be possible to put all of that stuff either in some kind of callback or at the Env level.

I agree.

From an outside point of view and by reading @jcallaham comments, the new code typically looks like a fork devoted to research experiments and publication. Indeed, it does not totally fit in the hydrogym package design at this stage (where I assume the hydrogym package is intended for public usage). However, the fixes are rather easy, I think:

  • Creating a wrapper for the inter-decision time and the reward aggregation
  • Probes locations global variables can be stored somewhere or uniformised
  • Unify solve and solve_multistep
  • typing, formatting etc.

I could help with this task by the end of Nov.

In the end, the "feature" provided by this new code are the introduction of inter-decision time and reward aggregation, probes, a modification of the magnitudes of the control bounds + a sign fix in the dynamics.

It would be interesting to have an explanation of why those changes are needed and how they impact the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Paper-Release priority High-priority core feature python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants