Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there missing objects in GT segmentation? #131

Open
TopCoder2K opened this issue Jan 17, 2023 · 5 comments
Open

Are there missing objects in GT segmentation? #131

TopCoder2K opened this issue Jan 17, 2023 · 5 comments

Comments

@TopCoder2K
Copy link

Hi, @MohitShridhar!

When I was debugging my model, I noticed that it can't take the Knife here
57
although the mask seems to be correct:
mask_57
I checked that the distance is correct: the Knife has the 'visible' property to be equal to True, but the interaction fails with CounterTop|+00.09|+00.89|-01.52 is not visible. Then I decided to visualize the GT segmentation:
gt_sem_seg_57
and there is no knife! One can think that it has the same color with the CounterTop, but I checked that instance_counter inside the thor_env.py indeed finds the only object --- the CounterTop...

Is it real or is there something I don't understand? Because if it is, we have to check somehow the number of such cases and maybe even recalculate the leaderboard results after fixing this.

@TopCoder2K
Copy link
Author

Also, by the way, it is strange that the bottom of the frying pan does not belong to the frying pan, judging by the color

@thomason-jesse
Copy link
Collaborator

Can you identify the trajectory in the ALFRED dataset to which this frame belongs? We can confirm using the replay scripts and original video whether the knife is interactable in that case. There is some stochasticity in the AI2-THOR simulator we are aware of that can cause objects to kind of "blink" like this, but it's not always replicable.

@TopCoder2K
Copy link
Author

TopCoder2K commented Jan 18, 2023

@thomason-jesse, thank you for your fast answer!

Can you identify the trajectory in the ALFRED dataset to which this frame belongs?

Sorry, what do you mean by 'identify'? Should I send the trajectory in the format of the evaluation server or will it be enough to send the actions the model took?
It is the 10th episode of the val_seen split ('pick_clean_then_place_in_recep-ButterKnife-None-Drawer-30/trial_T20190908_052007_212776'). The exact trajectory can be found in the 'Action' column of the log file
10.txt. I also can send the trajectory video and the trajectory data.

There is some stochasticity in the AI2-THOR simulator

Wow, I didn't know that! How can this manifest itself and how often does it happen? Can it also affect rendering? I hasn't managed to achieve determinism of the model execution. I fixed all the seeds, set torch to fully deterministic mode and even fixed 'PYTHONHASHSEED', but an execution of an episode is not deterministic.

@thomason-jesse
Copy link
Collaborator

To clarify: are these actions prescribed in the training data trajectory or actions your model has inferred separately? If you check out the execution video for the trajectory you named above (https://askforalfred.com/?vid=21032), it looks like the PDDL-planner-generated actions went for a different knife that might not exhibit this blinking/disappeared segmentation issue.

AI2THOR has a few non-deterministic quirks, as we note in a few of our FAQs and paper discussion on why even perfect replay from the PDDL-generated actions doesn't always result in 100% success rate. The idea of "fixing this" and re-doing leaderboard calculations is definitely out of scope.

Anyway, short answer: the segmentation mask on that knife in that scene configuration might just be bad and there's not much we can do about it 🤷.

@TopCoder2K
Copy link
Author

TopCoder2K commented Jan 24, 2023

To clarify: are these actions prescribed in the training data trajectory or actions your model has inferred separately?

These actions the model has inferred separately.

If you check out the execution video for the trajectory you named above (https://askforalfred.com/?vid=21032) <...>

Unfortunately, I can't see the video (I don't know why):
Screenshot from 2023-01-24 15-49-55
But the trajectory may be different, since the model predicted these actions, didn't take from ALFRED. The problem is that the knife increased the number of agent's failed actions and confused it.

as we note in a few of our FAQs and paper discussion on why even perfect replay from the PDDL-generated actions doesn't always result in 100% success rate

Hmmm, the end of the sentence seems familiar to me, but I don't remember seeing it in the ALFRED article... Anyway, I've already forgotten about it, so thank you for pointing out 👍

The idea of "fixing this" and re-doing leaderboard calculations is definitely out of scope.

I see. But can we guarantee that the number of such objects is very small for the test splits (e.g. they can be in 2-3 episodes)? If not, the leaderboard results can be biased...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants