Are there missing objects in GT segmentation? #131

TopCoder2K · 2023-01-17T13:12:39Z

When I was debugging my model, I noticed that it can't take the Knife here

although the mask seems to be correct:

I checked that the distance is correct: the Knife has the 'visible' property to be equal to True, but the interaction fails with CounterTop|+00.09|+00.89|-01.52 is not visible. Then I decided to visualize the GT segmentation:

and there is no knife! One can think that it has the same color with the CounterTop, but I checked that instance_counter inside the thor_env.py indeed finds the only object --- the CounterTop...

Is it real or is there something I don't understand? Because if it is, we have to check somehow the number of such cases and maybe even recalculate the leaderboard results after fixing this.

The text was updated successfully, but these errors were encountered:

TopCoder2K · 2023-01-17T13:45:09Z

Also, by the way, it is strange that the bottom of the frying pan does not belong to the frying pan, judging by the color

thomason-jesse · 2023-01-17T21:24:47Z

Can you identify the trajectory in the ALFRED dataset to which this frame belongs? We can confirm using the replay scripts and original video whether the knife is interactable in that case. There is some stochasticity in the AI2-THOR simulator we are aware of that can cause objects to kind of "blink" like this, but it's not always replicable.

TopCoder2K · 2023-01-18T10:17:35Z

@thomason-jesse, thank you for your fast answer!

Can you identify the trajectory in the ALFRED dataset to which this frame belongs?

Sorry, what do you mean by 'identify'? Should I send the trajectory in the format of the evaluation server or will it be enough to send the actions the model took?
It is the 10th episode of the val_seen split ('pick_clean_then_place_in_recep-ButterKnife-None-Drawer-30/trial_T20190908_052007_212776'). The exact trajectory can be found in the 'Action' column of the log file
10.txt. I also can send the trajectory video and the trajectory data.

There is some stochasticity in the AI2-THOR simulator

Wow, I didn't know that! How can this manifest itself and how often does it happen? Can it also affect rendering? I hasn't managed to achieve determinism of the model execution. I fixed all the seeds, set torch to fully deterministic mode and even fixed 'PYTHONHASHSEED', but an execution of an episode is not deterministic.

thomason-jesse · 2023-01-22T19:30:18Z

To clarify: are these actions prescribed in the training data trajectory or actions your model has inferred separately? If you check out the execution video for the trajectory you named above (https://askforalfred.com/?vid=21032), it looks like the PDDL-planner-generated actions went for a different knife that might not exhibit this blinking/disappeared segmentation issue.

AI2THOR has a few non-deterministic quirks, as we note in a few of our FAQs and paper discussion on why even perfect replay from the PDDL-generated actions doesn't always result in 100% success rate. The idea of "fixing this" and re-doing leaderboard calculations is definitely out of scope.

Anyway, short answer: the segmentation mask on that knife in that scene configuration might just be bad and there's not much we can do about it 🤷.

TopCoder2K · 2023-01-24T13:17:11Z

To clarify: are these actions prescribed in the training data trajectory or actions your model has inferred separately?

These actions the model has inferred separately.

If you check out the execution video for the trajectory you named above (https://askforalfred.com/?vid=21032) <...>

Unfortunately, I can't see the video (I don't know why):

But the trajectory may be different, since the model predicted these actions, didn't take from ALFRED. The problem is that the knife increased the number of agent's failed actions and confused it.

as we note in a few of our FAQs and paper discussion on why even perfect replay from the PDDL-generated actions doesn't always result in 100% success rate

Hmmm, the end of the sentence seems familiar to me, but I don't remember seeing it in the ALFRED article... Anyway, I've already forgotten about it, so thank you for pointing out 👍

The idea of "fixing this" and re-doing leaderboard calculations is definitely out of scope.

I see. But can we guarantee that the number of such objects is very small for the test splits (e.g. they can be in 2-3 episodes)? If not, the leaderboard results can be biased...

TopCoder2K mentioned this issue Feb 20, 2023

Why do I get different SR with SR score in the paper? soyeonm/FILM#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there missing objects in GT segmentation? #131

Are there missing objects in GT segmentation? #131

TopCoder2K commented Jan 17, 2023

TopCoder2K commented Jan 17, 2023

thomason-jesse commented Jan 17, 2023

TopCoder2K commented Jan 18, 2023 •

edited

Loading

thomason-jesse commented Jan 22, 2023

TopCoder2K commented Jan 24, 2023 •

edited

Loading

Are there missing objects in GT segmentation? #131

Are there missing objects in GT segmentation? #131

Comments

TopCoder2K commented Jan 17, 2023

TopCoder2K commented Jan 17, 2023

thomason-jesse commented Jan 17, 2023

TopCoder2K commented Jan 18, 2023 • edited Loading

thomason-jesse commented Jan 22, 2023

TopCoder2K commented Jan 24, 2023 • edited Loading

TopCoder2K commented Jan 18, 2023 •

edited

Loading

TopCoder2K commented Jan 24, 2023 •

edited

Loading