Utilities for generating `monty_config` and `dataset_args` for an arbitrary number of LMs. #94

scottcanoe · 2024-12-06T04:32:48Z

This adds a few functions to config_args.py and make_dataset_configs.py that allow users to create multi-LM experiment configs without writing and using custom config data classes.

Added to config_args.py

make_multi_lm_monty_config: The top-level utility function. Builds a MontyConfig object given some the number of LMs and Monty/SM/LM classes with their associated arguments.
make_multi_lm_connectivity: generates sm_to_agent_dict, sm_to_lm_matrix, lm_to_lm_matrix, and lm_to_lm_vote_matrix for "typical" multi-LM setups (i.e., 1 agent, 1:1 SM -> LM connectivity, all-to-all voting, and no hierarchy). Used by make_multi_lm_monty_config and may be useful if you write your MontyConfig generator.

Added to make_dataset_configs.py

make_multi_lm_habitat_dataset_args: The top-level utility function. Builds a MultiLMMountHabitatDatasetArgs object given the number of SMs and optionally some mount parameters (such as sensor positions, zooms, etc.).
make_multi_lm_mount_config: Builds a dictionary of mount parameters (positions, resolutions, etc.).
make_multi_lm_sensor_positions: Generates sensor positions. See my comment for more info.

Here's a usage example:

# Set up arguments for `make_multi_lm_monty_config`.
monty_config_args = [
    MontyForGraphMatching,  # monty_class
    DisplacementGraphLM,  # learning_module_class
    dict(k=5, match_attribute="displacement"),  # learning_module_args
    HabitatDistantPatchSM,  # sensor_module_class
    dict(features=["pose_vectors, "on_object, ...]),  # sensor_module_args
    NaiveScanPolicy,  # motor_system_class
    make_naive_scan_policy_config(step_size=5),  # motor_system_args
    dict(num_exploratory_steps=500),  # monty_args
]

pretrain_dist_agent_16lm = dict(
    experiment_class=MontySupervisedObjectPretrainingExperiment,
    experiment_args=ExperimentArgs(
        n_train_epochs=len(TRAIN_ROTATIONS),
        do_eval=False,
    ),
    logging_config=PretrainLoggingConfig(run_name="dist_agent_16lm"),
    monty_config=make_multi_lm_monty_config(16, *monty_config_args),  # <----------
    # Set up environment and agent.
    dataset_class=ED.EnvironmentDataset,
    dataset_args=make_multi_lm_habitat_dataset_args(16),  # <----------
    # Set up training dataloader.
    train_dataloader_class=ED.InformedEnvironmentDataLoader,
    train_dataloader_args=EnvironmentDataloaderPerObjectArgs(
        object_names=SHUFFLED_YCB_OBJECTS,
        object_init_sampler=PredefinedObjectInitializer(rotations=TRAIN_ROTATIONS),
    ),
)

I'm currently using a pattern like this in dmc_pretraining_experiments.py and dmc_eval_experiments.py (though I haven't submitted a monty_lab PR yet). It made it easy to add 4, 8, and 16 LM experiments and removed the need for a bunch of hand-made dataclasses (farewell toTenLMMountHabitatDatasetArgs, EnvInitArgsTenLMMount, and TenLMMountConfig).

Testing: I re-wrote DMC experiments to use these config generators. I then randist_agent_5lm and dist_agent_5lm_randrot_noise, and the results were identical. I've also pretrained models with 2, 4, 5, 8, 9, 10, and 16 LMs, and the object models look good.

Reworked ordering for 'distant' ordery_by, and cleaned up docs.

scottcanoe · 2024-12-06T21:39:41Z

A note on generating sensor module positions: I've implemented two choices for determining how to place sensor modules. The function make_multi_lm_sensor_positions has the parameters n_sms (number of sensor modules), delta (grid spacing, default is 1 cm), and order_by. order_by can be either "distance" or "spiral".

"spiral" is pretty straightforward. It starts at the center of a grid and spirals outward.
"distance" also spirals, but it prioritizes sensor module positions that are closest to the center.

For example, here are arrangements generated for 5 sensor modules:

Numbers indicate sensor module ID, and black squares don't have sensor modules ("view_finder" just indicates the last generated position which is always the same as sensor module 0).

Here's another for 16 sensor modules:

Though the "distance" method is a bit more jagged or aesthetically weird, I think it's the better choice since, in principal, it should reduce the occurrence of off-object observations for peripheral sensor modules. So I let "distance" by the default argument for make_multi_lm_sensor_positions.

I should note that you can always supply your own positions to make_multi_lm_habitat_dataset_args. For example, if you wanted to use the "spiral" method or shorter spacing between sensor modules, you can generate those with make_multi_lm_sensor_positions or create them manually and pass those positions into make_multi_lm_habitat_dataset_args.

…/tbp.monty into multi_lm_config_utils

vkakerbeck

Nice, this looks very useful! I left a few comments, mostly around variable and function names.
On a more general note, it would be nice if you can include the usage example from your PR description either in the documentation or in the docstring. In the usage example it would be good to specify the monty_config_args as a dictionary with keys instead of a list where it matters in which order they are specified and it is a lot less verbose (hence why you probably added comments)

src/tbp/monty/frameworks/config_utils/config_args.py

src/tbp/monty/frameworks/config_utils/make_dataset_configs.py

scottcanoe · 2024-12-11T22:34:33Z

I'm currently scrubbing references to learning modules in my make_dataset_configs.py functions in favor of multi_sensor..., but it occurs to me that the existing code uses multi-LM language. For example, MultiLMMountConfig and FiveLMMountConfig. Maybe we want to settle on the language and change these names as well.

Changed `make_mult_lm_monty_config` to take (required) keyword arguments. Also changed it to accept a `connectivity_func` that produces connectivity matrices.

scottcanoe · 2024-12-11T23:42:49Z

I'm ready for another round of review. Here's a brief summary of changes:

Some minor changes to variable names within functions to make them more descriptive.
I've replaced language related to learning modules and sensor modules with multi-sensor language in make_dataset_configs.py. This includes function names, parameters, and documentation. This is meant to prepare for future development where we may have more complex setups, but as I mentioned in a previous comment, this does clash somewhat with the multi-LM naming conventions used elsewhere in make_dataset_configs.py.
I've removed the required keyword-only pattern in make_multi_sensor_mount_config's signature.
In config_args.py, make_multi_lm_connectivity has been renamed to make_multi_lm_flat_dense_connectivity.
I've added a connectivity_func parameter in make_lm_monty_config to support the generation of non-flat, dense connectivity types.
I've also added the keyword-only pattern in make_multi_lm_monty_config to support function use with (unpacked) dictionaries instead of lists. However, I don't think many of the arguments have obvious default values, so I have made most of them required keyword arguments.

I haven't added a usage example to a docstring yet. I'm undecided as to where that should go (either make_multi_sensor_habitat_dataset_args, make_multi_lm_monty_config, or both?).

In general, I'm finding it a bit tricky to find the balance between a more generic approach that can support future use cases and functions that are useful now. For example, some functions (e.g., make_sensor_positions_on_grid) may be useful down the road in a multi-agent setting. However, pretty much every other function has certain (documented) constraints and assumptions baked in, such as 1:1 sensor -> sensor module relationships, 1:1 sensor module -> learning module relationships, a single agent, and the inclusion of a view finder.

I expect there will be some more back and forth about further changes.

nielsleadholm

Really nice PR, thanks for the super clear documentation. Just added a few comments.

Re. the names of the functions, I agree it's a hard balance to strike - personally I'm ok with how they are now, i.e. avoiding making the names super specific, but then also very long-winded.

src/tbp/monty/frameworks/config_utils/config_args.py

src/tbp/monty/frameworks/config_utils/make_dataset_configs.py

vkakerbeck

Thanks for all the updates! Looks very nice :) Happy to merge after you go through Niels comments.

Added optional `view_finder_config` parameter to `make_multi_lm_monty_config`.

Set `semantics` default to `False` except for the last entry which must be `True` for the view finder.

scottcanoe · 2024-12-12T22:16:14Z

Update: defaulting to False values in the semantics mount config caused unexpected problems. I played around with a few things, such as

Changing use_semantic_sensor=False in transforms and anywhere else relevant since those will be the defaults soon.
Setting the semantics value for the view finder to always be true (i.e., semantics[-1] = True).
Different combinations of values for semantics and use_semantic_sensor.

One of two errors was thrown, depending on the changes I made. Since these problems tie into our planned changes to use_semantic_sensor defaults, I thought it best to downgrade this PR until those changes are complete. It should be a bit easier to tackle these errors when I have a better idea of how the semantic sensor logic works, and it's probably best to test this code against the new defaults anyways.

scottcanoe added 3 commits December 5, 2024 13:02

Added multi_lm config helper functions

3b5fdd8

Update make_dataset_configs.py

95ca7ae

Reworked ordering for 'distant' ordery_by, and cleaned up docs.

Docstring updates

59c2347

Merge branch 'main' into multi_lm_config_utils

095f8f8

scottcanoe marked this pull request as ready for review December 6, 2024 22:40

scottcanoe requested review from vkakerbeck, nielsleadholm and hlee9212 December 6, 2024 22:40

scottcanoe added the enhancement New feature or request label Dec 6, 2024

Merge branch 'multi_lm_config_utils' of https://github.com/scottcanoe…

89b2df7

…/tbp.monty into multi_lm_config_utils

vkakerbeck reviewed Dec 9, 2024

View reviewed changes

tristanls assigned vkakerbeck, nielsleadholm and hlee9212 Dec 9, 2024

tristanls added the triaged This issue or pull request was triaged label Dec 9, 2024

scottcanoe added 2 commits December 11, 2024 13:49

Renamed variables and function names in config_args.py

681cafe

Renamed variables and function names in make_dataset_configs.py

ec8f1e6

scottcanoe added 2 commits December 11, 2024 14:49

Use multi_sensor language instead of multi_lm.

fad4635

Update config_args.py

6983167

Changed `make_mult_lm_monty_config` to take (required) keyword arguments. Also changed it to accept a `connectivity_func` that produces connectivity matrices.

nielsleadholm reviewed Dec 12, 2024

View reviewed changes

vkakerbeck approved these changes Dec 12, 2024

View reviewed changes

scottcanoe added 6 commits December 12, 2024 08:33

Improve docstrings, remove unecessary module-level comment

4e84d61

temp

aa4282c

Update make_dataset_configs.py

ffef1e1

Update config_args.py

db389fb

Added optional `view_finder_config` parameter to `make_multi_lm_monty_config`.

Update make_dataset_configs.py

f32f38b

Update make_dataset_configs.py

ba8ac87

Set `semantics` default to `False` except for the last entry which must be `True` for the view finder.

scottcanoe added 3 commits December 12, 2024 13:07

Fix assertion condition

67e9418

semantics = True... for now

1b54a98

add a TODO comment about np.bool vs bool

72ba298

scottcanoe marked this pull request as draft December 12, 2024 22:04

scottcanoe removed the triaged This issue or pull request was triaged label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilities for generating `monty_config` and `dataset_args` for an arbitrary number of LMs. #94

Utilities for generating `monty_config` and `dataset_args` for an arbitrary number of LMs. #94

scottcanoe commented Dec 6, 2024 •

edited

Loading

scottcanoe commented Dec 6, 2024 •

edited

Loading

vkakerbeck left a comment

scottcanoe commented Dec 11, 2024

scottcanoe commented Dec 11, 2024

nielsleadholm left a comment

vkakerbeck left a comment

scottcanoe commented Dec 12, 2024

Utilities for generating monty_config and dataset_args for an arbitrary number of LMs. #94

Are you sure you want to change the base?

Utilities for generating monty_config and dataset_args for an arbitrary number of LMs. #94

Conversation

scottcanoe commented Dec 6, 2024 • edited Loading

scottcanoe commented Dec 6, 2024 • edited Loading

vkakerbeck left a comment

Choose a reason for hiding this comment

scottcanoe commented Dec 11, 2024

scottcanoe commented Dec 11, 2024

nielsleadholm left a comment

Choose a reason for hiding this comment

vkakerbeck left a comment

Choose a reason for hiding this comment

scottcanoe commented Dec 12, 2024

Utilities for generating `monty_config` and `dataset_args` for an arbitrary number of LMs. #94

Utilities for generating `monty_config` and `dataset_args` for an arbitrary number of LMs. #94

scottcanoe commented Dec 6, 2024 •

edited

Loading

scottcanoe commented Dec 6, 2024 •

edited

Loading