Add future work documentation #87

nielsleadholm · 2024-12-02T11:45:53Z

Adds descriptions for many of the outstanding "future-work" areas of our documentation. This includes a few new sections:

simple-cross-modal-policy
improve-handling-of-symmetry
reuse-hypothesis-testing-points
support-scale-invariance

I've also added these to the overview document.

"Bottom-up distant agent policies" was removed because it was a duplicate, while a few others appear "removed" because of changes to their names/fixes to typos.

A few of the ones that I haven't done as I was a bit unsure what we had previously discussed, are:

Improve Bounded Evidence Performance
Make it Possible to Store Multiple Feature Maps on one Graph
Less Dependency on First Observation
Deal with Incomplete Models

@scottcanoe and @hlee9212 tagging you for any thoughts you want to add and as this also gives an overview of some of the things we can work on soon.

scottcanoe · 2024-12-04T01:50:41Z

docs/future-work/learning-module-improvements/use-models-with-less-points.md

@@ -1,3 +1,5 @@
 ---
 title: Use Models with Less Points


I hate to be that guy, but "Fewer" is also an option here.

Haha fair point, thanks

scottcanoe

Looks great, I'm excited to get to work on these!

vkakerbeck

Very nice! Thanks a lot for outlining all those :)
I left a bunch of detailed comments, some are definitely open to discussion. I just thought this would be a good place to have those discussions and nail down what exactly we mean by these items.
Overall, reading all these made me excited to jump into research again!

vkakerbeck · 2024-12-05T15:14:46Z

docs/future-work/cmp-hierarchy-improvements/add-associative-connections.md

@@ -1,4 +1,9 @@
 ---
 title: Add Associative Connections


For this one I was actually thinking of associative connections like between the vision model of a car and the sound a car makes, and the word "car" etc.. I was thinking these would be analogous to lateral voting connections. What you describe here would go under "Add Top-Down Connections".

Ah ok that makes sense, I was confused by the potential duplication (I think because I focused on the term "hierarchy" in the cmp-hierarchy grouping).

With that cleared up, I wonder if it's a bit of a duplication of "Generalize Voting to Associative Connections" --> my temptation would be to keep that one, and add the point that this should enable associating e.g. sound objects with physical objects (i.e. where their models may not both be 3D), and get rid of "Add Associative Connections" under cmp-hierarchy. What do you think @vkakerbeck ?

wow yes! That only now clicked for me that those two are basically the same. Its kind of cool that we can solve both these with the same solution. I think I had added this one under hierarchy because the first time I thought about these was in the context of modeling language and grounding it in physical models of objects. But I think we should just remove this one and expand on the one under voting like you suggest. Maybe add the "abstract" or "num_steps" label to it

Ok nice, yeah sounds good!

vkakerbeck · 2024-12-05T15:25:27Z

docs/future-work/cmp-hierarchy-improvements/add-top-down-connections.md

@@ -1,3 +1,5 @@
 ---
 title: Add Top-Down Connections
 ---
+
+One of the main roles of top-down connections is the associative recall and prediction outlined in [Associative Connections](add-associative-connections.md). However, top-down projections can also support decomposing goal-states into specific sub-goals, as discussed in [Decomposing Goal States](../motor-system-improvements/decompose-goals-into-subgoals-communicate.md).


Related to the comment above, I would use the description you wrote out for the previous topic here. I wouldn't think of the goal states as the top-down connections. Those belong in the motor section, specifically "Decompose Goals into Subgoals & Communicate"

vkakerbeck · 2024-12-05T15:32:15Z

...k/cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md

+
+As we introduce hierarchy and leverage more unsupervised learning, representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon".
+
+One approach to measure the "correctness" of representations in this setting might be how well a predicted representation aligns with the outside world. For example, while LMs are not designed to be used as generative models, we could visualize how well an inferred object graph maps onto the object actually present in the world. Quantifying such alignment might leverage measures such as differences in point-clouds. This would provide some evidence of how well the learned decomposition of objects corresponds to the actual objects present in the world.


I'm not sure we actually need to measure this. If we model and recognize compositional objects I would assume that just the outputs of the highest level LMs would be enough to judge how well the system does on those compositional datasets. Maybe we would want to measure additional things like number of graphs learned at lower levels etc (which we already do). We can leave it here as an additional suggestion but I think when we start taking a crack at the compositional dataset this wouldn't be the first thing I would start with.
Another point would be that in our compositional dataset we know what the sub-objects are (forks, knives, spoons,...) and we know the compositional objects (set dinner table,...) Somehow we want to system to learn these. That's what I meant with "Figure out supervision" So for instance, should we show the sub objects first and give labels for those to all LMs, then show the compositional scenes and give labels to all? What is the desired outcome? Do we want lower-level LMs to learn rough models of the scenes? Do we want higher-level LMs to learn models of the cutlery as well? I would add a lot more around that in this section.

Good point, I've added those items to the start.

vkakerbeck · 2024-12-05T15:36:55Z

...ure-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md

@@ -1,3 +1,11 @@
 ---
 title: Send Similarity Encoding Object ID to Next Level & Test
 ---
+
+We have implemented the ability to encode object IDs using sparse-distributed representations (SDRs), and in particular can use this as a way of capturing similarity and disimlarity between objects. Using such encodings in learned [Associative Connections](add-associative-connections.md), we should observe a degree of natural generalization when recognizing compositional objects.


I'm not sure we are interpreting the term "associative connections" in Monty the same way. When I wrote that I meant associations between object IDs that coocur (basically voting), not hierarchical connections. Since those are spatially a lot more constrained I wouldn't think of them the same way. Why would we need learned associative connections to see the effect of similarity encodings?

Yeah I've changed this to Hierarchical Connections, per the earlier discussion.

vkakerbeck · 2024-12-05T15:40:04Z

...ure-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md

+
+For example, assume a Monty system learns a dinner table setting with normal cuttlery and plates. Separately, the system learns about medieval instances of cuttlery and plates, but never sees them arranged in a dinner table setting. Based on the similarity of the medieval cutterly objects to their modern counterparts, the objects should have considerable overlap in their SDR encodings.
+
+If the system was to then see a medieval dinner table setting for the first time, it should be able to recognize the arrangement as a dinner-table setting with reasonable confidence, even if the constituent objects are somewhat different from those present when the compositional object was first learned.


Could be nice to include images of these two scenes here for better visualization

Good point! Adding

vkakerbeck · 2024-12-06T12:22:03Z

docs/future-work/sensor-module-improvements/detect-local-and-global-flow.md

@@ -1,3 +1,9 @@
 ---
 title: Detect Local and Global Flow
 ---
+
+Our general view is that there are two sources of flow processed by cortical columns. A larger receptive field sensor helps to estimate global flow, where flow here will be particularly pronounced if the whole object is moving, or the sensor itself is moving. A small receptive-field sensor patch corresponds to the channel by which the primary sensory features (e.g. point-normal, color) arrive. If flow is detected here, but not in the more global channel, then it is likely that just part of the object is moving.


I would make the distinction more clear:
local flow - object is moving
global flow - sensor is moving

We should also mention that these may not be detectable with the same sensor (small patch can't distinguish between object and sensor movement since for the patch both of it would be global flow.

Ok yeah that's what I was trying to get at (the uncertainty depending on the size), will try rewording it.

vkakerbeck · 2024-12-06T12:23:28Z

docs/future-work/sensor-module-improvements/extract-better-features.md

+
+In the short term, we would like to extract richer features, such as using HTM's spatial-pooler or Local Binary Patterns for visual features, or processing depth information within a patch to approximate tactile texture.
+
+In the longer-term, given the "sub-cortical" nature of this sensory processing, we might also consider neural-network based feature extraction, such as shallow convolutional neural networks, however please see [our FAQ on why Monty does not currently use deep learning](../../how-monty-works/faq-monty.md#why-does-monty-not-make-use-of-deep-learning).


It would also be worth mentioning that extracted features should be rotation invariant. So if we look at the same location on an object from different angles, the extracted feature should be the same. This is not a given with neural networks or many other approaches.

vkakerbeck · 2024-12-06T12:26:21Z

...-work/voting-improvements/can-we-change-the-cmp-to-use-displacements-instead-of-locations.md

+
+However, a more general formulation might be to use displacements as the core spatial information in the CMP, such that a specific location (in body-centric coordinates or otherwise) is never communicated outside of an LM or sensor module.
+
+Such an approach might align well with adding information about flow (see [Detect Local and Global Flow](../sensor-module-improvements/detect-local-and-global-flow.md)), modeling moving objects (see [Deal With Moving Objects](../learning-module-improvements/deal-with-moving-objects.md)), and supporting abstract movements like the transition from grandchild to grandparent.


This approach will be very tricky since if we don't get the location of the sensor relative to the body it is almost impossible to output anything in a common reference frame and therefor vote or send other outputs. We could do basic voting on object ID and rely on colocation of receptive fields in the hierarchy but also motor commands will be pretty much impossible this way. We could mention that this could allude to the difference between the where and what pathway. But also that this is not something we plan to implement but merely a possibility we want to investigate further.

vkakerbeck · 2024-12-06T12:30:39Z

docs/future-work/voting-improvements/generalize-voting-to-associative-connections.md

+Currently, voting relies on all learning modules sharing the same object ID for any given object, as a form of supervised learning signal. Thanks to this, they can vote on this particular ID when communicating with one-another.
+
+However, in the setting of unsupervised learning, the object ID that is associated with any given model is unique to the parent LM. As such, we need to organically learn the mapping between the object IDs that occur together across different LMs, such that voting can function without any supervised learning signal.


You could also mention that the brain has to do the same thing. It could not use a globally consistent SDR representation for each object. The neurons just do associative learning and a cortical column has no idea what the incoming spikes mean.

docs/future-work/voting-improvements/use-pose-for-voting.md

vkakerbeck · 2024-12-06T12:35:26Z

Oh, lastly, it would be good if you can update this sheet https://docs.google.com/spreadsheets/d/10b0FR9YdFYqfhIiGMpZsjmN2OAbNAjp4m_hLBCV161I/edit?gid=0#gid=0 with the topics you added/renamed so they match exactly

hlee9212 · 2024-12-08T19:20:36Z

...ure-work/cmp-hierarchy-improvements/send-similarity-encoding-object-id-to-next-level-test.md

Just jotting down here that I'm interested in this direction. :)

hlee9212 · 2024-12-08T19:27:03Z

...ronment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md

This always reminded me of the problem of multi-label classification (https://paperswithcode.com/task/multi-label-classification). It might be worth looking into some off-the-shelf model that can attach multiple labels, or even multiple attributes / afforances.

hlee9212 · 2024-12-08T19:34:38Z

docs/future-work/learning-module-improvements/use-models-with-fewer-points.md

Just wondering, why do we want fewer points? As in, is it for speed purposes only?

I'm asking because I think Viviane said that we chose to use graphs to model objects, but we don't necessarily believe it is how the brain stores them.

A thought I had around this was modeling objects as a set of representation vectors of sub-objects, like:

cup = {vector for handle, vector for body}
spoon = {vector for grabbing part, vector for curved part}
fork = {vector for grabbing part (identical to spoon), vector for curved part)

But if this has a related goal that we are trying to do, I'm all down. :)

Do you mean like we will do when we have hierarchy? Basically in the higher-level LM, the objects modeled in the lower-level LM become features in that graph? So the switch, light bulb and lambp shade modeled in the LLLM become features in the lamp model in the HLLM?
One thing I just want to make sure you are not forgetting (you probably aren't but since you didn't mention it in your comment) is that we always model features AT LOCATIONS. And the relative locations of features are actually more important than the features themselves (think of how you can easily recognize a face made of fruits but wouldn't call a random assortment of noses and eyes a face). We always use reference frame, not just a bag of features.

As for using less points: Yes, this would be an efficiency and generalization point. At the very beginning we would store every point in our models which would give us perfect accuracy but it was slow. So over time we figured several ways to use less points (feature change SM, graph_delta_thresholds, ...) These give us significant efficiency gains with little loss in accuracy and there are still a few more ways we can make our models more efficient by using even less points. This also relates a bit to hierarchy since in the larger objects composed of sub objects we would likely want to store less points.

hlee9212 · 2024-12-08T19:36:23Z

docs/future-work/motor-system-improvements/enable-policy-switching-learning-vs-inference.md

Wish I could give a 👍 like we are requesting features, haha. 🤣 This one would be very useful!

hlee9212

Thanks so much Niels! I'm really amazed how much you have added explanations to all future directions, it's mind boggling. 🤯

LGTM on PR. Will the next step by trying to prioritize these somehow?

vkakerbeck · 2024-12-09T08:07:18Z

We have all of them listed out here https://docs.google.com/spreadsheets/d/10b0FR9YdFYqfhIiGMpZsjmN2OAbNAjp4m_hLBCV161I/edit?gid=0#gid=0 and some are already grouped into our next milestones (below the table). We will likely create 1-2 new, intermediate milestones to prepare for the Heterarchy pt. 2 milestone (and eventually hierarchical goal policies).

…bservations.md Co-authored-by: vclay <[email protected]>

Co-authored-by: vclay <[email protected]>

…-saccades-driven-by-model-free-and-model-based-signals.md Co-authored-by: vclay <[email protected]>

…' into Update-future-work-documentation

nielsleadholm · 2024-12-11T15:17:55Z

Thanks for the helpful comments @vkakerbeck and @hlee9212 ! Those should all be addressed now but let me know if there are any further changes you want. I'll also now double check the overview spreadsheet and make sure all the cells are updated to match here.

nielsleadholm · 2024-12-11T15:47:59Z

@vkakerbeck I've also updated some of the hashtags where I felt some were missing.

nielsleadholm · 2024-12-11T16:07:55Z

Lastly I've gone through and made sure the names for future-work sections are consistent across the individual articles, the header .md files, and the overview sheet.

nielsleadholm added 9 commits November 27, 2024 11:47

Add SM suggestions

7602233

Add more suggestions to future work

f2a4d63

Add more info esp. on policies

1435220

Further updates to policy directions

ffb5d05

Add RL and decomposition motor info

bf3b402

Add more CMP and multi-object etc. descriptions

8b3cdfa

Add more descriptions e.g. symmetry handling

cf95ff1

Add note on scale invariance

7a7485d

Merge branch 'main' into Update-future-work-documentation

cea1e11

nielsleadholm requested review from scottcanoe, vkakerbeck and hlee9212 December 2, 2024 11:45

nielsleadholm added 7 commits December 2, 2024 12:13

Fix broken links

645ae53

Add links to new tasks

5d115a8

Typo fixes

af5b69e

Further typo fixes

9eb2fea

Updates to memory description

8423b85

Add dataset link

5f1b891

General language improvements

774238e

tristanls assigned scottcanoe, vkakerbeck and hlee9212 Dec 2, 2024

tristanls added documentation Improvements or additions to documentation triaged This issue or pull request was triaged labels Dec 2, 2024

scottcanoe reviewed Dec 4, 2024

View reviewed changes

scottcanoe approved these changes Dec 4, 2024

View reviewed changes

nielsleadholm added 3 commits December 5, 2024 11:52

Change naming of less to fewer

b101f13

Add details on re-anchoring

f40b8b3

Fix typo

aa80d5f

vkakerbeck reviewed Dec 6, 2024

View reviewed changes

hlee9212 reviewed Dec 8, 2024

View reviewed changes

hlee9212 approved these changes Dec 8, 2024

View reviewed changes

nielsleadholm and others added 13 commits December 10, 2024 13:26

Adjustments to voting-related work

1d2b7cf

Update to supervision description

1d29d40

Improve comments re. multi-object datasets

257b550

Reformulate using priors work

b7eeea7

Address further review comments

1b1eb15

Update docs/future-work/learning-module-improvements/use-off-object-o…

2ddee71

…bservations.md Co-authored-by: vclay <[email protected]>

Update docs/future-work/learning-module-improvements/use-off-object-o…

3b58a04

…bservations.md Co-authored-by: vclay <[email protected]>

Update docs/future-work/voting-improvements/use-pose-for-voting.md

720600b

Co-authored-by: vclay <[email protected]>

Update docs/future-work/motor-system-improvements/implement-efficient…

24566d9

…-saccades-driven-by-model-free-and-model-based-signals.md Co-authored-by: vclay <[email protected]>

Update particle filter description

8ef1072

Merge remote-tracking branch 'origin/Update-future-work-documentation…

d86623c

…' into Update-future-work-documentation

Merge branch 'main' into Update-future-work-documentation

a6fe0f5

Further changes for review comments

daec672

Update hash-tags for tasks

d45d72f

Make naming consistent for future work sections

399b48b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add future work documentation #87

Add future work documentation #87

nielsleadholm commented Dec 2, 2024 •

edited

Loading

scottcanoe Dec 4, 2024

nielsleadholm Dec 5, 2024

scottcanoe left a comment

vkakerbeck left a comment

vkakerbeck Dec 5, 2024

nielsleadholm Dec 10, 2024

vkakerbeck Dec 10, 2024

nielsleadholm Dec 10, 2024

vkakerbeck Dec 5, 2024

vkakerbeck Dec 5, 2024

nielsleadholm Dec 10, 2024

vkakerbeck Dec 5, 2024

nielsleadholm Dec 10, 2024

vkakerbeck Dec 5, 2024

nielsleadholm Dec 10, 2024

vkakerbeck Dec 6, 2024

nielsleadholm Dec 11, 2024

vkakerbeck Dec 6, 2024

vkakerbeck Dec 6, 2024

vkakerbeck Dec 6, 2024

vkakerbeck commented Dec 6, 2024

hlee9212 Dec 8, 2024

hlee9212 Dec 8, 2024

hlee9212 Dec 8, 2024

vkakerbeck Dec 9, 2024

hlee9212 Dec 8, 2024

hlee9212 left a comment

vkakerbeck commented Dec 9, 2024

nielsleadholm commented Dec 11, 2024

nielsleadholm commented Dec 11, 2024

nielsleadholm commented Dec 11, 2024


		As we introduce hierarchy and leverage more unsupervised learning, representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon".

		One approach to measure the "correctness" of representations in this setting might be how well a predicted representation aligns with the outside world. For example, while LMs are not designed to be used as generative models, we could visualize how well an inferred object graph maps onto the object actually present in the world. Quantifying such alignment might leverage measures such as differences in point-clouds. This would provide some evidence of how well the learned decomposition of objects corresponds to the actual objects present in the world.


		For example, assume a Monty system learns a dinner table setting with normal cuttlery and plates. Separately, the system learns about medieval instances of cuttlery and plates, but never sees them arranged in a dinner table setting. Based on the similarity of the medieval cutterly objects to their modern counterparts, the objects should have considerable overlap in their SDR encodings.

		If the system was to then see a medieval dinner table setting for the first time, it should be able to recognize the arrangement as a dinner-table setting with reasonable confidence, even if the constituent objects are somewhat different from those present when the compositional object was first learned.


		In the short term, we would like to extract richer features, such as using HTM's spatial-pooler or Local Binary Patterns for visual features, or processing depth information within a patch to approximate tactile texture.

		In the longer-term, given the "sub-cortical" nature of this sensory processing, we might also consider neural-network based feature extraction, such as shallow convolutional neural networks, however please see [our FAQ on why Monty does not currently use deep learning](../../how-monty-works/faq-monty.md#why-does-monty-not-make-use-of-deep-learning).


		However, a more general formulation might be to use displacements as the core spatial information in the CMP, such that a specific location (in body-centric coordinates or otherwise) is never communicated outside of an LM or sensor module.

		Such an approach might align well with adding information about flow (see [Detect Local and Global Flow](../sensor-module-improvements/detect-local-and-global-flow.md)), modeling moving objects (see [Deal With Moving Objects](../learning-module-improvements/deal-with-moving-objects.md)), and supporting abstract movements like the transition from grandchild to grandparent.

		Currently, voting relies on all learning modules sharing the same object ID for any given object, as a form of supervised learning signal. Thanks to this, they can vote on this particular ID when communicating with one-another.

		However, in the setting of unsupervised learning, the object ID that is associated with any given model is unique to the parent LM. As such, we need to organically learn the mapping between the object IDs that occur together across different LMs, such that voting can function without any supervised learning signal.

Add future work documentation #87

Are you sure you want to change the base?

Add future work documentation #87

Conversation

nielsleadholm commented Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottcanoe left a comment

Choose a reason for hiding this comment

vkakerbeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkakerbeck commented Dec 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlee9212 left a comment

Choose a reason for hiding this comment

vkakerbeck commented Dec 9, 2024

nielsleadholm commented Dec 11, 2024

nielsleadholm commented Dec 11, 2024

nielsleadholm commented Dec 11, 2024

nielsleadholm commented Dec 2, 2024 •

edited

Loading