Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Tuning #241

Open
wants to merge 105 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
e827e75
created remoteRunner class
MiloLurati Jan 22, 2024
deab579
added remote actor class
Jan 23, 2024
52287b9
update remote runner
Jan 23, 2024
40cc888
added remote_mode function argument to tune_kernel and related remote…
Jan 23, 2024
b14aaf0
added parallel tuning test
Jan 23, 2024
1a55a5c
added pool of actors
Jan 23, 2024
4fef594
clean up remote runner and actor
Jan 24, 2024
3f3b9e6
updates on remote code
Jan 30, 2024
fe5da39
changed naming from remote to parallel
Apr 2, 2024
a43dc84
added get_num_devices function
Apr 4, 2024
ab3aa24
added ensemble and parallel runner related stuff
Apr 4, 2024
e8a7228
switched to new naming of parallel remote and some clean up
Apr 4, 2024
3dd748c
added class instances needed down the line in the execution of the en…
Apr 4, 2024
e743bec
changed naming due to ensemble implementation, this was the original …
Apr 4, 2024
df949d0
started ensemble implementation, very basic functionality works
Apr 4, 2024
45a1747
updated tests
Apr 4, 2024
5fb5927
clean up in parallel runner
Apr 5, 2024
a96ef43
moved to sub directory ray
Apr 5, 2024
c831f5f
added subdirectory ray with all 3 actor classes
Apr 5, 2024
0cc2a6e
itegrated calls to cache manager functions when running in ensemble
Apr 5, 2024
b816f3d
added cache manager logic
Apr 5, 2024
781839a
added instances needed for the ensemble down the line of execution
Apr 8, 2024
9f8d212
added strategy option to get_options function
Apr 8, 2024
d08b5d4
added ignore_reinit_error to ray init
Apr 10, 2024
903c981
added ignore_reinit_error to ray init
Apr 10, 2024
1a2219a
added cache manager to parallel tuning
Apr 10, 2024
a476585
re-assign tuning options to final version from the cache manager at t…
Apr 11, 2024
6233e09
small bug fix in execute
MiloLurati Apr 11, 2024
cde62ae
Merge branch 'KernelTuner:master' into parallelTuning
MiloLurati Apr 14, 2024
0722629
Merge pull request #2 from KernelTuner/simulation-searchspace-improve…
MiloLurati Apr 14, 2024
14e5f0b
updates to run ensemble in simulation mode on CPUs
MiloLurati Apr 16, 2024
a963dac
fixed problem with ray resources and stalling actors
MiloLurati Apr 23, 2024
c55b870
added setup_resources and new impl of costfunc (not yet tested and st…
MiloLurati Apr 25, 2024
d8541a0
added ensemble and memetic to strategy map and import
MiloLurati Apr 25, 2024
c755254
rearranged how parallel runner deals with cache manager and actor's l…
MiloLurati Apr 25, 2024
a23ef94
initial adaptions for memetic and cleaned up logic of ensemble
MiloLurati Apr 25, 2024
697ead0
returning tuning_options for memetic logic
MiloLurati Apr 25, 2024
b247ed0
init impl of memetic strategy
MiloLurati Apr 25, 2024
948ab7f
initial adapion for memetic strategy
MiloLurati Apr 25, 2024
9e40d4e
removed brute_force from strategy map and import
MiloLurati Apr 25, 2024
3cb428d
fixes of new costfunc and stop criterion is checked retrospectively
MiloLurati Apr 25, 2024
2d13fc3
fixed bug with tuning options cache manager
MiloLurati Apr 29, 2024
1a2ba53
fixed some bugs for memetic algo functioning
MiloLurati Apr 29, 2024
2aba6f5
removed debug prints
MiloLurati Apr 29, 2024
cd3f212
fixed problem with single config input and final results data structure
MiloLurati Apr 29, 2024
af9bd5e
added progress prints of memetic algo and kill statement for cache ma…
MiloLurati Apr 29, 2024
d382f05
sort results for retrospective stop criterion check
MiloLurati Apr 29, 2024
218b8ac
added comments
MiloLurati Apr 29, 2024
79b7a50
updated returning results logic in _evaluate_configs()
MiloLurati Apr 29, 2024
88f63b4
added comments
MiloLurati Apr 29, 2024
a2afd1d
updates to run more strategies then devices available
MiloLurati Apr 30, 2024
d950b2d
returning last two lists of candidates for memetic algo
MiloLurati May 3, 2024
980777f
returning last two candidates for memetic algo
MiloLurati May 3, 2024
95a2f0f
returning last two populations for memetic algo
MiloLurati May 3, 2024
89c499b
implemented adaptive local search depth logic and fix few issues, wor…
MiloLurati May 3, 2024
babba0b
modifications related to last iteration of memetic algo
MiloLurati May 6, 2024
e0e1e61
updates related to old popuation logic
MiloLurati May 6, 2024
6305782
unified two actors into one
MiloLurati May 7, 2024
0f2b7e4
updates related to actors unification and memetic algo development
MiloLurati May 7, 2024
63ddedb
added create_actor_on_device and initialize_ray
MiloLurati May 7, 2024
d7fe9b4
updates realted to unification of actors, memetic algo, and reutiliza…
MiloLurati May 7, 2024
46fcde1
returning 80% of cpus for simulation mode in get_num_devices
MiloLurati May 7, 2024
d543848
updates realted to actor unification and reutilization of actors for …
MiloLurati May 7, 2024
15df6ea
updates on feval counting and distributing
MiloLurati May 7, 2024
96d03b8
Merge branch 'KernelTuner:master' into parallelTuning
MiloLurati May 8, 2024
18ce214
Merge branch 'parallelTuning' of https://github.com/MiloLurati/kernel…
MiloLurati May 8, 2024
ec719a2
added logic for time limit stop
MiloLurati May 10, 2024
6c2a62b
debug prints clean up
MiloLurati May 10, 2024
c7fd2af
unified parallel tuning and parallel ensemble logic in ParallelRunner
MiloLurati May 10, 2024
af532c5
added self.init_arguments for parallel runner execution
MiloLurati May 28, 2024
82d9886
fix about non-pickleable observers and other small adjustments
MiloLurati May 28, 2024
c6a2f36
now the cache manager deals only with the cache and not with the enti…
MiloLurati May 28, 2024
5fe2e56
fix related to non-pickleable observers
MiloLurati May 28, 2024
3b3317c
update related to new cache manager
MiloLurati May 28, 2024
1593806
added cleanup at the end of the ensemble
MiloLurati May 28, 2024
efd5be2
changes to hyperparameters
MiloLurati May 28, 2024
bc66244
changes related to non-pickleable observers
MiloLurati May 28, 2024
c5cfd05
Merge branch 'master' into parallelTuning
MiloLurati May 28, 2024
9e9f1af
updated init_arguments to a dict
MiloLurati May 31, 2024
3fed66c
updates for searchspace split, ensemble related fix, and observer exe…
MiloLurati May 31, 2024
86a9b67
small corections related to stop criterion for memetic
MiloLurati Jun 5, 2024
de5fc49
added logic to check if all GPUs are of the same type
MiloLurati Jun 7, 2024
1b0adb0
deleted split searchspace function
MiloLurati Jun 7, 2024
5130286
changed place where ray is initialized
MiloLurati Jun 7, 2024
5b9d817
setting BO to random sampling if needed
MiloLurati Jun 7, 2024
8b1e57f
Merge branch 'KernelTuner:master' into parallelTuning
MiloLurati Jun 7, 2024
040a57e
added num_gpus option
MiloLurati Jun 7, 2024
acaaeb1
removed debug print
MiloLurati Jun 10, 2024
63d9f65
added check_and_retrive strategy option
MiloLurati Jun 10, 2024
e604510
moved reinitialization of actor observers to execute method, before w…
MiloLurati Jun 18, 2024
5933a69
changes related to re-initialization of observers in actor init and d…
MiloLurati Jun 18, 2024
4e4c47b
removed unnecesary blocking ray.get
MiloLurati Jun 21, 2024
104205d
removed debug prints
MiloLurati Jul 1, 2024
123fba5
added greedy ils esemble instead of default
MiloLurati Jul 1, 2024
d381011
added check on strategy_options
MiloLurati Jul 1, 2024
7e832e3
removed all memetic algo related stuff
MiloLurati Jul 1, 2024
e976bf8
Merge branch 'KernelTuner:master' into parallelTuning
MiloLurati Jul 1, 2024
65d32c1
added ray to pyproject.toml
MiloLurati Jul 1, 2024
a841f2a
Merge branch 'parallelTuning' of https://github.com/MiloLurati/kernel…
MiloLurati Jul 1, 2024
503df1b
updated toml file with ray dashboard
MiloLurati Jul 1, 2024
c126a01
fix small bug in _evaluate_configs
MiloLurati Jul 1, 2024
4df1b0d
adapted test for ensemble
MiloLurati Jul 1, 2024
29a507c
cleaned up not used imports
MiloLurati Jul 1, 2024
7c49a29
added comments
MiloLurati Jul 1, 2024
eb5db41
added documentation and related fixes
MiloLurati Jul 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ the ``strategy=`` optional argument of ``tune_kernel()``. Kernel Tuner currently
* "pso" particle swarm optimization
* "random_sample" takes a random sample of the search space
* "simulated_annealing" simulated annealing strategy
* "ensemble" ensemble strategy

Most strategies have some mechanism built in to detect when to stop tuning, which may be controlled through specific
parameters that can be passed to the strategies using the ``strategy_options=`` optional argument of ``tune_kernel()``. You
Expand Down
16 changes: 14 additions & 2 deletions kernel_tuner/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from kernel_tuner.integration import get_objective_defaults
from kernel_tuner.runners.sequential import SequentialRunner
from kernel_tuner.runners.simulation import SimulationRunner
from kernel_tuner.runners.parallel import ParallelRunner
from kernel_tuner.searchspace import Searchspace

try:
Expand All @@ -57,6 +58,7 @@
pso,
random_sample,
simulated_annealing,
ensemble
)

strategy_map = {
Expand All @@ -75,6 +77,7 @@
"simulated_annealing": simulated_annealing,
"firefly_algorithm": firefly_algorithm,
"bayes_opt": bayes_opt,
"ensemble": ensemble,
}


Expand Down Expand Up @@ -384,6 +387,7 @@ def __deepcopy__(self, _):
* "pso" particle swarm optimization
* "random_sample" takes a random sample of the search space
* "simulated_annealing" simulated annealing strategy
* "ensemble" Ensemble Strategy

Strategy-specific parameters and options are explained under strategy_options.

Expand Down Expand Up @@ -463,6 +467,7 @@ def __deepcopy__(self, _):
),
("metrics", ("specifies user-defined metrics, please see :ref:`metrics`.", "dict")),
("simulation_mode", ("Simulate an auto-tuning search from an existing cachefile", "bool")),
("parallel_mode", ("Run the auto-tuning on multiple devices (brute-force execution)", "bool")),
("observers", ("""A list of Observers to use during tuning, please see :ref:`observers`.""", "list")),
]
)
Expand Down Expand Up @@ -574,6 +579,7 @@ def tune_kernel(
cache=None,
metrics=None,
simulation_mode=False,
parallel_mode=False,
observers=None,
objective=None,
objective_higher_is_better=None,
Expand Down Expand Up @@ -611,6 +617,8 @@ def tune_kernel(
tuning_options["max_fevals"] = strategy_options["max_fevals"]
if strategy_options and "time_limit" in strategy_options:
tuning_options["time_limit"] = strategy_options["time_limit"]
if strategy_options and "num_gpus" in strategy_options:
tuning_options["num_gpus"] = strategy_options["num_gpus"]

logging.debug("tune_kernel called")
logging.debug("kernel_options: %s", util.get_config_string(kernel_options))
Expand Down Expand Up @@ -650,9 +658,13 @@ def tune_kernel(
strategy = brute_force

# select the runner for this job based on input
selected_runner = SimulationRunner if simulation_mode else SequentialRunner
selected_runner = SimulationRunner if simulation_mode else (ParallelRunner if parallel_mode else SequentialRunner)
tuning_options.simulated_time = 0
runner = selected_runner(kernelsource, kernel_options, device_options, iterations, observers)
if parallel_mode:
num_gpus = tuning_options['num_gpus'] if 'num_gpus' in tuning_options else None
runner = selected_runner(kernelsource, kernel_options, device_options, iterations, observers, num_gpus=num_gpus)
else:
runner = selected_runner(kernelsource, kernel_options, device_options, iterations, observers)

# the user-specified function may or may not have an optional atol argument;
# we normalize it so that it always accepts atol.
Expand Down
17 changes: 17 additions & 0 deletions kernel_tuner/observers/nvml.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,15 @@ def __init__(
continous_duration=1,
):
"""Create an NVMLObserver."""
# needed for re-initializing observer on ray actor
self.init_arguments = {
"observables": observables,
"device": device,
"save_all": save_all,
"nvidia_smi_fallback": nvidia_smi_fallback,
"use_locked_clocks": use_locked_clocks,
"continous_duration": continous_duration
}
if nvidia_smi_fallback:
self.nvml = nvml(
device,
Expand Down Expand Up @@ -424,6 +433,14 @@ def __init__(self, observables, parent, nvml_instance, continous_duration=1):
self.parent = parent
self.nvml = nvml_instance

# needed for re-initializing observer on ray actor
self.init_arguments = {
"observables": observables,
"parent": parent,
"nvml_instance": nvml_instance,
"continous_duration": continous_duration
}

supported = ["power_readings", "nvml_power", "nvml_energy"]
for obs in observables:
if obs not in supported:
Expand Down
5 changes: 5 additions & 0 deletions kernel_tuner/observers/pmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ class PMTObserver(BenchmarkObserver):
def __init__(self, observable=None):
if not pmt:
raise ImportError("could not import pmt")

# needed for re-initializing observer on ray actor
self.init_arguments = {
"observable": observable
}

# User specifices a dictonary of platforms and corresponding device
if type(observable) is dict:
Expand Down
6 changes: 6 additions & 0 deletions kernel_tuner/observers/powersensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ class PowerSensorObserver(BenchmarkObserver):
def __init__(self, observables=None, device=None):
if not powersensor:
raise ImportError("could not import powersensor")

# needed for re-initializing observer on ray actor
self.init_arguments = {
"observables": observables,
"device": device
}

supported = ["ps_energy", "ps_power"]
for obs in observables:
Expand Down
Loading