-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add future work documentation #87
base: main
Are you sure you want to change the base?
Changes from 19 commits
7602233
f2a4d63
1435220
ffb5d05
bf3b402
8b3cdfa
cf95ff1
7a7485d
cea1e11
645ae53
5d115a8
af5b69e
9eb2fea
8423b85
5f1b891
774238e
b101f13
f40b8b3
aa80d5f
1d2b7cf
1d29d40
257b550
b7eeea7
1b1eb15
2ddee71
3b58a04
720600b
24566d9
8ef1072
d86623c
a6fe0f5
daec672
d45d72f
399b48b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,9 @@ | ||
--- | ||
title: Add Associative Connections | ||
--- | ||
Are those the same as voting connections? I think so. | ||
|
||
In Monty systems, low-level LMs project to high-level LMs, where this projection occurs if their sensory receptive fields are co-aligned. Associative connections should be able to learn a mapping between objects represented at these low-level LMs, and objects represented in the high-level LMs that frequently co-occur. Such learning would be similar to that required for [Generalizing Voting To Associative Connections](../voting-improvements/generalize-voting-to-associative-connections.md). | ||
|
||
For example, a high-level LM of a dinner-set might have learned that the fork is present at a particular location in its internal reference frame. When at that location, it would therefore predict that the low-level LM should be sensing a fork, enabling the perception of a fork in the low-level LM even when there is a degree of noise or other source of uncertainty in the low-level LM's representation. | ||
|
||
In the brain, these top-down projections correspond to L6 to L1 connections, where the synapses at L1 would support predictions about object ID. However, these projections also form local synapses en-route through the L6 layer of the lower-level cortical column. In a Monty LM, this would correspond to the associative connection predicting not just the object that the low-level LM should be sensing, but also the specific location that it should be sensing it at. This could be complemented with predicting a particular pose of the low-level object (see [Bias Rotation Hypotheses](../learning-module-improvements/bias-rotation-hypotheses.md)). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
--- | ||
title: Add Top-Down Connections | ||
--- | ||
|
||
One of the main roles of top-down connections is the associative recall and prediction outlined in [Associative Connections](add-associative-connections.md). However, top-down projections can also support decomposing goal-states into specific sub-goals, as discussed in [Decomposing Goal States](../motor-system-improvements/decompose-goals-into-subgoals-communicate.md). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to the comment above, I would use the description you wrote out for the previous topic here. I wouldn't think of the goal states as the top-down connections. Those belong in the motor section, specifically "Decompose Goals into Subgoals & Communicate" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
--- | ||
title: Figure out Performance Measure and Supervision in Heterarchy | ||
title: Figure out Performance Measures and Supervision in Heterarchy | ||
--- | ||
|
||
As we introduce hierarchy and leverage more unsupervised learning, representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon". | ||
|
||
One approach to measure the "correctness" of representations in this setting might be how well a predicted representation aligns with the outside world. For example, while LMs are not designed to be used as generative models, we could visualize how well an inferred object graph maps onto the object actually present in the world. Quantifying such alignment might leverage measures such as differences in point-clouds. This would provide some evidence of how well the learned decomposition of objects corresponds to the actual objects present in the world. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we actually need to measure this. If we model and recognize compositional objects I would assume that just the outputs of the highest level LMs would be enough to judge how well the system does on those compositional datasets. Maybe we would want to measure additional things like number of graphs learned at lower levels etc (which we already do). We can leave it here as an additional suggestion but I think when we start taking a crack at the compositional dataset this wouldn't be the first thing I would start with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I've added those items to the start. |
||
|
||
See also [Make Dataset to Test Compositional Objects](../environment-improvements/make-dataset-to-test-compositional-objects.md) and [Metrics to Evaluate Categories and Generalization](../environment-improvements/create-dataset-and-metrics-to-evaluate-categories-and-generalization.md). |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just jotting down here that I'm interested in this direction. :) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,11 @@ | ||
--- | ||
title: Send Similarity Encoding Object ID to Next Level & Test | ||
--- | ||
|
||
We have implemented the ability to encode object IDs using sparse-distributed representations (SDRs), and in particular can use this as a way of capturing similarity and disimlarity between objects. Using such encodings in learned [Associative Connections](add-associative-connections.md), we should observe a degree of natural generalization when recognizing compositional objects. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we are interpreting the term "associative connections" in Monty the same way. When I wrote that I meant associations between object IDs that coocur (basically voting), not hierarchical connections. Since those are spatially a lot more constrained I wouldn't think of them the same way. Why would we need learned associative connections to see the effect of similarity encodings? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I've changed this to Hierarchical Connections, per the earlier discussion. |
||
|
||
For example, assume a Monty system learns a dinner table setting with normal cuttlery and plates. Separately, the system learns about medieval instances of cuttlery and plates, but never sees them arranged in a dinner table setting. Based on the similarity of the medieval cutterly objects to their modern counterparts, the objects should have considerable overlap in their SDR encodings. | ||
|
||
If the system was to then see a medieval dinner table setting for the first time, it should be able to recognize the arrangement as a dinner-table setting with reasonable confidence, even if the constituent objects are somewhat different from those present when the compositional object was first learned. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could be nice to include images of these two scenes here for better visualization There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! Adding |
||
|
||
We should note that we are still determining whether overlapping bits between SDRs is the best way to encode object similarity. As such, we are also open to exploring this task with alternative approaches, such as directly making use of values in the evidence-similarity matrix (from which SDRs are currently derived). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,11 @@ | ||
--- | ||
title: Test Learning at Different Speeds Depending on Level in Hierarchy | ||
--- | ||
|
||
Our general view is that episodic memory and working memory in the brain leverage similar representations to those in learning modules, i.e. structured reference frames of discrete objects. | ||
|
||
For example, the brain has a specialized region for episodic memory (the hippocampal complex), due to the large number of synapses required to rapidly form novel binding associations. However, we believe the core algorithms of the hippocampal complex follow the same principles of a cortical column (and therefore a learning module), with learning simply occurring on a faster time scale. | ||
|
||
As such, we would like to explore adding forms of episodic and working memory by introducing high-level learning modules that learn information on extremely fast time scales relative to lower-level LMs. These should be particularly valuable in settings such as recognizing multi-object arrangements in a scene, and providing memory when a Monty system is performing a multi-step task. Note that because of the overlap in the core algorithms, LMs can be used largely as-is for these memory systems, with the only change being the learning rate. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be worth noting that the |
||
|
||
As an additional note, varying the learning rate across learning modules will likely play an important role in dealing with representational drift, and the impact it can have on continual learning. For example, we expect that low-level LMs, which partly form the representations in higher-level LMs, will change their representations more slowly. |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This always reminded me of the problem of multi-label classification (https://paperswithcode.com/task/multi-label-classification). It might be worth looking into some off-the-shelf model that can attach multiple labels, or even multiple attributes / afforances. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,11 @@ | ||
--- | ||
title: Create Dataset and Metrics to Evaluate Categories and Generalization | ||
--- | ||
|
||
Datasets do not typically capture the flexibility of object labels based on whether an object belongs to a broad class (e.g. cans), vs. a specific instance of a class (e.g. a can of tomato soup). | ||
|
||
Labeling a dataset with "hierarchical" labels, such that an object might be both a "can", as well as a "can of tomato soup" would be one approach to capturing this flexibility. Once available, classification accuracy could be assessed both at the level of individual object instances, as well as at the level of categories. | ||
|
||
We might leverage crowd-sourced labels to ensure that this labeling is reflective of human perception, and not biased by our beliefs as designers of Monty. | ||
|
||
Initially such labels should focus on morphology, as this is the current focus of Monty's recognition system. However, we would eventually want to also account for affordances, such as an object that is a chair, a vessel, etc. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you go into affordances you could also mention that the ultimate measure of the system will be how well it can interact with the world. So in a sense, measuring categorization performance is just an experimental stepping stone and affordances would probably best be tested measuring actual object manipulation performance. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
--- | ||
title: Make Dataset to Test Compositional Objects | ||
--- | ||
|
||
We have developed an initial dataset based on setting a dinner-table with a variety of objects. For example, the objects can be arranged in a normal setting, or aligned in a row (i.e. not a typical dinner-table setting). Similarly, the component objects can be those of a modern dining table, or those from a "medieval" time-period. As such, this dataset can be used to test the ability of Monty systems to recognize compositional objects based on the specific arrangement of objects, and to test generalization to novel compositions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would phrase this a bit more passive ("setting a dinner-table" -> "recognizing a variety of dinner table sets with different arrangements of plates and cutlery") |
||
|
||
By using explicit objects to compose multi-part objects, this dataset has the advantage that we can learn on the component objects in isolation, using supervised learning signals if necessary. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe specify what we mean by component objects. Could also be good to highlight that this is usually how we learn. Children (and adults) interact with objects in isolation and then recognize them in compositional scenes. Its hard to know where an object behind and ends without interacting with it and seeing it in isolation or an a variety of backgrounds. |
||
|
||
However, we would eventually expect compositional objects to be learned in an unsupervised manner. When this is consistently possible, we can consider more diverse datasets where the component objects may not be as explicit. At that time, the challenges described in [Figure out Performance Measure and Supervision in Heterarchy](../cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) will become more relevant. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
--- | ||
title: Set up Environment that Allows for Object Manipulation | ||
--- | ||
|
||
See [Decompose Goals Into Subgoals & Communicate](../motor-system-improvements/decompose-goals-into-subgoals-communicate.md) for a discussion of the kind of tasks we are considering for early object-manipulation experiments. An even simpler task that we have recently considered is pressing a switch to turn a lamp on or off. We will provide further details on what these tasks might look like soon. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add here that an important aspect of this task is to find a good simulator and figure out how we best set up an environment and agent for such a task (avoiding objects falling into the void, resetting the environment, modeling friction,...) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
--- | ||
title: Add Infrastructure for Multiple Agents that Move Independently | ||
--- | ||
|
||
Currently, Monty's infrastructure only supports a single agent that moves around the scene, where that agent can be associated with a plurality of sensors and LMs. We would like to add support for multiple agents that move independently. | ||
|
||
For example, a hand-like surface-agent might explore the surface of an object, where each one of its "fingers" can move in a semi-independent manner. At the same time, a distant-agent might observe the object, saccading across its surface independent of the surface agent. At other times they might coordinate, such that they perceive the same location on an object at the same time, which would be useful while voting connections are still being learned (see [Generalize Voting to Associative Connections](../voting-improvements/generalize-voting-to-associative-connections.md)). | ||
|
||
An example of a first task that could make use of this infrastructure is [Implement a Simple Cross-Modal Policy for Sensory Guidance](../motor-system-improvements/simple-cross-modal-policy.md). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to mention here that we are thinking to introduce "motor modules" instead if a singular "motor system" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
title: Bias Rotation Hypotheses by Common Object Poses | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this one is not added to the TOC or sheet yet There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could also formulate this a bit broader as "Use Better Priors for Hypothesis Initialization" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, and thanks for catching that |
||
--- | ||
|
||
Currently all object poses are equally likely, because stimuli exist in a void and are typically rotated randomly at test time. However, as we move towards compositional and scene-like datasets where certain object poses are more common, we would like to account for this information in our hypothesis testing. | ||
|
||
A simple way to do this is to store in long-term memory the frequently encountered object poses, and bias these with more evidence during initialization. A consequence of this is that objects should be recognized more quickly when they are in a typical pose, consistent with human behavior. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could add a reference here |
||
|
||
In terms of implementation, this could be done either relative to a body-centric coordinate, through a hierarchical biasing, or both. With the former, the object would have an inherent bias towards a pose relative to the observer, or some more abstract reference-frame like gravity (e.g. right-side up coffee mug). With the latter, the pose would be biased with respect to a higher-level, compositional object. For example, in a dinner table setup, the orientation of the fork and knife would be biased relative to the plate, even though in of themselves, the fork and knife do not have any inherent bias in their pose. This information would be stored in the compositional dinner-set object in the higher level LM, and the bias in pose implemented by top-down feedback to the low-level LM. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
--- | ||
title: Deal with Moving Objects | ||
--- | ||
|
||
This work relates to first being able to [Detect Local and Global Flow](../../future-work/sensor-module-improvements/detect-local-and-global-flow.md). | ||
|
||
Our current idea is to then use this information to model the state of the object, such that beyond its current pose, we also capture how it is moving as a function of time. This information can then be made available to other learning modules for voting and hierarchical processing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe explicitly say that part of the object state could be features like "velocity" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I guess it just feels like velocity could be something more fundamental, i.e. that might be part of all CMP messages once we have flow etc. working? I.e. that things like color are optional, but whether something is moving or not is as central as its pose? But might not end up being so. |
||
|
||
This work also relates to [Modeling Object Behaviors and States](../../future-work/learning-module-improvements/implement-test-gnns-to-model-object-behaviors-states.md), as an object state might be quite simple (the object is moving in a straight line at a constant velocity), or more complex (e.g. in a "spinning" or "dancing" state). To pass such information via the Cortical Messaging Protocol, the former would likely be treated similar to pose (i.e. specific information shared, but limited in scope), while the latter would be shared more similar to object ID, i.e. via a summary representation that can be learned via association. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,11 @@ | ||
--- | ||
title: Implement & Test GNNs to Model Object Behaviors & States | ||
--- | ||
|
||
We would like to test using local functions between nodes of an LM's graph to model object behaviors. In particular, we would like to model how an object evolves over time due to external and internal influences, by learning how nodes within the object impact one-another based on these factors. This relates to graph-neural networks, and [graph networks more generally](https://arxiv.org/pdf/1806.01261), however learning should rely on sensory and motor information local to the LM. Ideally learned relations will generalize across different edges, e.g. the understanding that two nodes are connected by a rigid edge vs. a spring. | ||
|
||
As noted, all learning should happen locally within the graph, so although gradient descent can be used, we should not back-propagate error signals through other LMs. Please see our related policy on [using Numpy rather than Pytorch for contributions](../../contributing/style-guide#numpy-preferred-over-pytorch). | ||
|
||
We have a dataset that should be useful for testing approaches to this task, which can be found in [Monty Labs](https://github.com/thousandbrainsproject/monty_lab/tree/main/object_behaviors). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also link explicitly to this section where we reviewed some literature on this and wrote down our thoughts on how exactly GNNs could be used in Monty https://github.com/thousandbrainsproject/monty_lab/tree/main/object_behaviors#implementation-routes-for-the-relational-inference-model |
||
|
||
At a broader level, we are also investigating alternative methods for modeling object behaviors, including sequence-based methods similar to HTM, however we believe it is worth exploring graph network approaches as one (potentially complementary) approach. In particular, we may find that such learned edges are useful for frequently encountered node-interactions like basic physics, while sequence-based methods are best suited for idiosyncratic behaviors. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: Improve Handling of Symmetry | ||
--- | ||
|
||
LMs currently recognize symmetry by making multiple observations in a row that are all consistent with a set of multiple poses. I.e. if new observations of an object do not eliminate any of a set of poses, then it is likely that these poses are equivalent/symmetric. | ||
|
||
To make this more efficient and robust, we might store symmetric poses in long-term memory, updating them over time. In particular: | ||
- Whenever symmetry is detected, the poses associated with the state could be stored for that object. | ||
- Over-time, we can reduce or expand this list of symmetric poses, enabling the LM to establish with reasonable confidence that an object is in a symmetric pose as soon as the hypothesized poses fall within the list. | ||
|
||
By developing an established list of symmetric poses, we might also improve voting on such symmetric poses - see [Using Pose for Voting](../voting-improvements/use-pose-for-voting.md). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
--- | ||
title: Re-Anchor Hypotheses | ||
title: Re-Anchor Hypotheses for Robustness to Noise and Distortions | ||
--- | ||
|
||
One aspect that we believe may contribute to dealing with object distortions, such as perceiving Dali's melted clocks for the first time, or being robust to the way a logo follows the surface of a mug, is through re-anchoring of hypotheses. More concretely, as the system moves over the object and path-integrates, the estimate of where the sensor is in space might lend greater weight to sensory landmarks, resulting in a re-assessment of the current location. Such re-anchoring is required even without distortions, due to the fact that path integration in the real world is imperfect. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could also mention that this would help relieve some of the over-reliance on the first observation. Currently the first observation initializes the hypothesis space so if that observation is noisy or doesn't resemble any of the points in the model, it has an overproportional impact on performance. |
||
|
||
Such an approach would likely be further supported by hierarchical, top-down connections (see also [Add Top-Down Connections](../cmp-hierarchy-improvements/add-top-down-connections.md)). This will be relevant where the system has previously learned how a low-level object is associated with a high-level object at multiple locations, and where the low-level object is in some way distorted. In this instance, the system can re-instate where it is on the low-level object, based on where it is on the high-level object. Depending on the degree of distortion of the object, we would expect more such location-location associations to be learned in order to capture the relationship between the two. For example, a logo on a flat surface with a single 90-degree bend in it might just need two location associations to be learned and represented, while a heavily distorted logo would require more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one I was actually thinking of associative connections like between the vision model of a car and the sound a car makes, and the word "car" etc.. I was thinking these would be analogous to lateral voting connections. What you describe here would go under "Add Top-Down Connections".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok that makes sense, I was confused by the potential duplication (I think because I focused on the term "hierarchy" in the
cmp-hierarchy
grouping).With that cleared up, I wonder if it's a bit of a duplication of "Generalize Voting to Associative Connections" --> my temptation would be to keep that one, and add the point that this should enable associating e.g. sound objects with physical objects (i.e. where their models may not both be 3D), and get rid of "Add Associative Connections" under cmp-hierarchy. What do you think @vkakerbeck ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow yes! That only now clicked for me that those two are basically the same. Its kind of cool that we can solve both these with the same solution. I think I had added this one under hierarchy because the first time I thought about these was in the context of modeling language and grounding it in physical models of objects. But I think we should just remove this one and expand on the one under voting like you suggest. Maybe add the "abstract" or "num_steps" label to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nice, yeah sounds good!