Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Softbot's calibration results with perfect simulation #950

Closed
brunofavs opened this issue May 9, 2024 · 16 comments
Closed

Bad Softbot's calibration results with perfect simulation #950

brunofavs opened this issue May 9, 2024 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@brunofavs
Copy link
Collaborator

brunofavs commented May 9, 2024

Problems

  • When calibrating the softbot system without any perturbances, we are getting 1px errors instead of something near 0.

  • Adding noise is also unreliable. Adding more noise adds error until a certain threshold that makes the error magically disappear.

What was tried already :

  • Commenting the noise functions doens't do anything.
    addNoiseToInitialGuess(dataset, args, selected_collection_key)
    addBiasToJointParameters(dataset, args)
    addNoiseFromNoisyTFLinks(dataset,args,selected_collection_key)
  • Testing with a smaller prior dataset doens't help
  • Calibrating a simpler system like the RRbot works greatly

What I need to test

  • Another dataset on softbot
  • Limiting the sensors with -ssf, might be something to do with the LiDAR

If those don't work

I 'll post here a branch with a bagfile + dataset to make a MWE (minimum working example for other people to help out)

Tagging @miguelriemoliveira @manuelgitgomes for visibility

Updates

Using just the RGB cameras like in RRbot doens't help either, leading to believe it has to be a faulty dataset or a bug in calibrate

Table -> Calibration results with the previous simpler dataset used on atom_examples

+------------+------------------------+-------------------------+
| Collection | front_left_camera [px] | front_right_camera [px] |
+------------+------------------------+-------------------------+
|    000     |         2.7308         |          0.5883         |
|    001     |         1.2917         |          0.6410         |
|    002     |         0.7326         |          0.6489         |
|    003     |         0.6055         |          0.4272         |
|  Averages  |         1.3401         |          0.5764         |
+------------+------------------------+-------------------------+
@brunofavs brunofavs added the bug Something isn't working label May 9, 2024
@brunofavs brunofavs self-assigned this May 9, 2024
brunofavs added a commit that referenced this issue May 9, 2024
New minimal dataset, issue persists
@brunofavs
Copy link
Collaborator Author

This branch issue950 has a MWE example to test this issue.

The configuration file is updated with the new bagfile.

On this link you can transfer the dataset and bagfile.

To recreate :

Calibrating odometry :

rosrun atom_calibration calibrate -json $ATOM_DATASETS/softbot/issue950/dataset_corrected.json -v -ss 1

Without calibrating odometry :

rosrun atom_calibration calibrate -json $ATOM_DATASETS/softbot/issue950/dataset_corrected.json -v -ss 1 -atsl 'lambda x : x in []'

This is a completly perfect simulation.

We start with reprojection errors of around 3px and finish off at around 0.3px.

@brunofavs
Copy link
Collaborator Author

brunofavs commented May 9, 2024

I found out that removing the problematic collection 31from the long dataset solves the first issue. With perfect calibration calibrating odometry : This is 264 parameters

| Averages | 2.5161 | 1.9251 | 0.0060 |

Without calibrating odometry : This is 24 parameters.

| Averages | 0.4000 | 0.3731 | 0.0057 |

The optimizer might be falling a local minima with these many parameters maybe, I'm not sure.

@manuelgitgomes told me this does make sense due to overfitting.

@brunofavs
Copy link
Collaborator Author

Seems that without calibrating odometry the issue of the noise dissapearing after a threshold is also gone when not calibrating odometry.

WhatsApp Image 2024-05-09 at 15 57 10

I still dont get is why some results with noise in the odometry and calibrating the odometry are great whereas others skyrocket to absurd values.

@brunofavs
Copy link
Collaborator Author

#952 was likely another thing throwing us off.

@brunofavs
Copy link
Collaborator Author

@miguelriemoliveira I will use the script I made today in #951 to automate all sorts of insightful plots so we can better analyze the data.

Are you available too meet Monday to talk about it? I should have all plots ready by then.

@brunofavs
Copy link
Collaborator Author

Here are all the plots computed from the batch.

plots.zip

I couldn't plot anything like "without calibrating odom vs calibrating odom" but I don't have enough data as I didn't run a lot of experiments without calibrating odometry. The few I have do prove that calibrating a system with noisy odometry
leads to pretty awful results.

I'm not sure if I should rerun all these experiments without calibrating odometry

brunofavs added a commit that referenced this issue May 12, 2024
This is better because if the fold lists are defined with strings it's
limiting on the lambdas. The user has to user whichever string
identifier the fold lists's isnt using and he has no idea until the
program crashes when evaluating lambdas
@miguelriemoliveira
Copy link
Member

Hi @brunofavs ,

we can talk Monday morning if you want.

9h or later?

@brunofavs
Copy link
Collaborator Author

Sounds good to me.

Zoom or at Lar?

@miguelriemoliveira
Copy link
Member

miguelriemoliveira commented May 12, 2024 via email

@brunofavs
Copy link
Collaborator Author

Ok zoom it is,
See ya tomorrow then :)

@brunofavs
Copy link
Collaborator Author

Hey @miguelriemoliveira @manuelgitgomes

I've been running a lot of experiments on the terminal trying to figure this out. The problem of possibly being on the "flat" part of the curve which lead to seeing apparently static results in plots such as the RGB calibration results was confirmed.

Thing is I'm getting odd behaviors.

When running this command ( x and y as placeholders for nig/ntvf values respectively)

clear && rosrun atom_calibration calibrate \
-json $ATOM_DATASETS/softbot/long_train_dataset1/dataset_corrected.json \
-v \
-ss 1 -nig x x \
-ntfv y y \
-ntfl "world:base_footprint" \
-csf 'lambda x: int(x) in [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 18]' \
-ctgt \
-atsf "lambda x : x not in []" \
-ftol 1e-4 -xtol 1e-4 -gtol 1e-4

I'm getting this table for y(ntfv) = 0 and x(nig) varying :

Not calibrating odometry here

| nig 	| ctgt    	| residuals 	|
|-----	|---------	|-----------	|
| 0.1 	| 0.00251 	| 1.00      	|
| 0.2 	| 0.00565 	| 1.13      	|
| 0.4 	| 0.00530 	| 1.11      	|
| 0.6 	| 0.00578 	| 1.05      	|
| 1.0 	| 0.00680 	| 1.17      	|
| 2   	| 0.00581 	| 1.07      	|
| 5   	| 0.00573 	| 1.08      	|
| 10  	| 0.009   	| 0.16      	|
| 15  	| 0.0099  	| 0.09      	|

It's a weak assumption to say it converges to all feasible nig values. This behavior is very odd to me. We won't see any plot other than either a horizontal line for the ctgt or a even weirder one for the residuals, which would be a negative slope line, not making a lot of sense.


Adding a fixed noise to odometry and varying again the sensor pose noise (nig), I get a consistent similar behavior :

Calibrating odometry here

| nig 	| ctgt    	| residuals 	|
|-----	|---------	|-----------	|
| 0.1 	| 0.013   	| 0.88      	|
| 0.2 	| 0.0095  	| 0.6       	|
| 0.4 	| 0.0092  	| 0.38      	|
| 0.6 	| 0.00985 	| 0.38      	|
| 1.0 	| 0.00987 	| 0.24      	|
| 10  	| 0.010   	| 0.17      	|

ctgtsomewhat constant and residuals declining.

Note: This experiment isn't the same as the plot we discussed in the morning varying ntfv for a fixed nig, here I am varying nig for a given ntfv value, I'll talk about this case in the next section


Varying ntfv for a given nig = 0.3 :

Calibrating odometry here

| nig  	| ctgt    	| residuals 	|
|------	|---------	|-----------	|
| 0.1  	| 0.0085  	| 0.45      	|
| 0.2  	| 0.0098  	| 0.8       	|
| 0.4  	| 0.00906 	| 1.58      	|
| 0.6  	| 0.00988 	| 1.3       	|
| 0.65 	| 0.09    	| 2.25      	|
| 0.7  	| 0.1     	| 1.77      	|
| 0.75 	| 0.1     	| 1.97      	|
| 1    	| 0.3     	| 15000     	|

Here the limit it more visible. At ntfv = 0.7'ish for nig = 0.3 the results start to deteorate quickly. The residuals paint a similar story.


For different values of nig other than 0.3, does the optimizer also break at ntfv = 0.7?

Once hitting the threshold, the error curves seem to skyrocket.

  • For nig = 0.1 , ntfv = 0.77 showed a perfect calibration
  • For nig = 0.1, ntfv = 0.79, the error was already 0.46, almost 5 times bigger than the initial guess.

I do see that the data with higher nig does reach the ntfv point where the optimizer can't handle it anymore earlier. That's the expected. However I expected this behavior change to be more linear and less abrupt.

I was hoping in this plot :
image

To make the point that for a certain value of ntfv we could actually see the lines in the correct order. Meaning higher nig lines would be higher on the graph. If I could catch in the plot the zone of the behavior change I could make that point. I'm fearing it won't be very visible as the behavior is so abrupt that it will just spike upwards and that's it.

My worries

I could eventually use a logarithmic scale on my plots but that would require a lot more data in the threshold zone. I would need increments like 0.77-0.772-0.774....0.79 to actually have the possibility of seeing something.

That would take a absurd amount of data and time to make meaningful conclusions.

I could also just make the assumption that for all feasible noise values the optimizer can deal with the problem.

I'm not sure what do do

@brunofavs
Copy link
Collaborator Author

brunofavs commented May 13, 2024

I will run a batch with -ftol 1e-4 -xtol 1e-4 -gtol 1e-4 , 1 run stratified shuffle split 70/30 1 split with :

  • Data near thresholds
  • All results with and without calibrating odometry

Will post the plots later.

@miguelriemoliveira
Copy link
Member

Going step by step through your comments:

It's a weak assumption to say it converges to all feasible nig values. This behavior is very odd to me. We won't see any plot other than either a horizontal line for the ctgt or a even weirder one for the residuals, which would be a negative slope line, not making a lot of sense.

Well, the ctgt is increasing as it should . The residuals are overfitting, I would not worry about those.
Also, nig 5 or 10 is too much. For angles it is more than 360 degrees!

Adding a fixed noise to odometry and varying again the sensor pose noise (nig), I get a consistent similar behavior :

I don't think I agree with this comparison, its not apples to apples because before you were not calibrating odometry and now you are.

In any case, also here I see nothing wrong. CTGT is increasing as it should.

Varying ntfv for a given nig = 0.3

also nothing strange here

However I expected this behavior change to be more linear and less abrupt.

Not sure you have any grounds to be expecting that. It could be abrupt or progressive, there is really not way of saying one is wrong and the other correct.

To make the point that for a certain value of ntfv we could actually see the lines in the correct order. Meaning higher nig lines would be higher on the graph. If I could catch in the plot the zone of the behavior change I could make that point. I'm fearing it won't be very visible as the behavior is so abrupt that it will just spike upwards and that's it.

This is still a bit strange to me, but remember we are seeing residuals, and those are not reliable.
If you do the graph with nig I think results would make sense.

I could eventually use a logarithmic scale on my plots but that would require a lot more data in the threshold zone. I would need increments like 0.77-0.772-0.774....0.79 to actually have the possibility of seeing something.

Too complicated.

That would take a absurd amount of data and time to make meaningful conclusions.

Right, do not do it.

I could also just make the assumption that for all feasible noise values the optimizer can deal with the problem.
I'm not sure what do do

I think the graph with ctgt values would make sense.

The conclusion to draw from these is that we should not trust the residuals, only the ctgt or, if the do not have ground truth, the evaluations (hopefully).

@brunofavs
Copy link
Collaborator Author

Well, the ctgt is increasing as it should . The residuals are overfitting, I would not worry about those.
Also, nig 5 or 10 is too much. For angles it is more than 360 degrees!

I mean, they do increase as they should. But they increase so little that it makes me unconfident. Yep I know any past 2pi is "useless", I was really just exhausting possibilities because everything seemed to converge.

I don't think I agree with this comparison, its not apples to apples because before you were not calibrating odometry and now you are.
In any case, also here I see nothing wrong. CTGT is increasing as it should.

It is increasing? It starts at 0.013 and finished at 0.010. From 0.013 went down twice then up thrice to 0.010.

also nothing strange here

Here I do agree, for a fixed nig (while calibrating odom), increasing ntfv increases CTGT as expected, regardless of the residuals being overfitted. This is a great conclusion.

Not sure you have any grounds to be expecting that. It could be abrupt or progressive, there is really not way of saying one is wrong and the other correct.

I really have none to be fair. The model is a black box and I was expecting some linearity due to having some of said linearity for lower values. It was incorrect.

I think the graph with ctgt values would make sense.
The conclusion to draw from these is that we should not trust the residuals, only the ctgt or, if the do not have ground truth, the evaluations (hopefully).

I agree.

@brunofavs
Copy link
Collaborator Author

brunofavs commented May 14, 2024

Update on this :

I will run a batch with -ftol 1e-4 -xtol 1e-4 -gtol 1e-4 , 1 run stratified shuffle split 70/30 1 split with :

I ran this batch. There was a odd error on the LiDAR evaluation so I took it off ( will create a issue for it --> #957 ).

The main things to remark are:

  1. The plots with error relative to the grount truth (ctgt)
  2. The noise plots in the zone where the optimizer fails :

Starting with (1)

Note for all the plots below : the Y label should've said "Error (m)" that was my bad.

Calibrating Odometry

More nig more error -> Makes total sense

More odom noise (ntfv) more noise, up until the point where the optimizer can't handle it anymore, also makes sense

Not calibrating Odometry

Without the extra noise on the odom the optimizer can take a lot more nig noise. For all the data points on this plot the calibration results are great. Even for huge noises it wouldn't fail. It failed only at 20m/rad nig, which is outrageous and unrealistic.

The oscillation is amplified by the y axis limits, they only vary about 1mm.

Here the results seem a little more odd, even though it also spikes upwards at 0.7. I will elaborate more on the next section. Here the error is one entire order of magnitude. I would argue that from the point that the error is way bigger than the starting point, the optimizer is already lost and the results aren't reliable. I would ask @manuelgitgomes to help me on a more scientific explanation for this. We discussed it today but I'm failing to have the right words to properly explain it really.

Optimizer spiking at 0.7

If we zoom on the second plot of the previous section :

We can see that generally the higher degree noise curves spike quicker. I'm not sure this is even a useful conclusion or not.

But what I find rather odd is that every curve spikes roughly at the same point (ntfv = 0.7m/rad). I don't have a clear explanation for this.

The 'ctgt' tables printed at the end of the calibration always seemed to match.

There is something that I will correct for the final version though. Rn the ctgt is saving a file with only the averages.

This is a little problematic for 2 reasons. For one there is a anchored sensor with error 0 and there is also the odom error weighing in so I'm not exactly getting the final error of each sensor.

Example table to clarify my point.

+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+
|                      Transform                       |       Description       | Et0 [m] |  Et [m] | Rrot0 [rad] | Erot [rad] |
+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+
|  front_left_camera_link-front_left_camera_rgb_frame  |    front_left_camera    | 0.00000 | 0.00000 |   0.00000   |  0.00000   |
| front_right_camera_link-front_right_camera_rgb_frame |    front_right_camera   | 0.20000 | 0.20000 |   0.20000   |  0.20000   |
|         lidar3d_plate_link-lidar3d_base_link         |         lidar3d         | 0.20000 | 0.20000 |   0.20000   |  0.20000   |
|                 world-base_footprint                 | world_to_base_footprint | 0.75000 | 0.75000 |   0.75000   |  0.75000   |
|                       Averages                       |           ---           | 0.28750 | 0.28750 |   0.28750   |  0.28750   |
+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+

Final results

Adding up the calibrations with and without odom, this will easily take 2 to 3 days to compute on my pc.
@miguelriemoliveira do you think its worth to strip down the data yaml now that we know roughly which plots are more important or invest the time in doing everything in case they prove useful later on?

Also I'm not sure whether to do something before this big batch. I wanted it ideally to be the final one to focus the rest on the time on the writting.

@miguelriemoliveira
Copy link
Member

But what I find rather odd is that every curve spikes roughly at the same point (ntfv = 0.7m/rad). I don't have a clear explanation for this.

I think there is a bug with the nig or something. This very well defined limit suggest it.

This is a little problematic for 2 reasons. For one there is a anchored sensor with error 0 and there is also the odom error weighing in so I'm not exactly getting the final error of each sensor.

Right, it would be nice to show separately for each sensor.

Also I'm not sure whether to do something before this big batch. I wanted it ideally to be the final one to focus the rest on the time on the writting.

My suggestion. Wait a bit and lets have the meeting tomorrow, then talk on Friday.
In the meantime you can start writing ...

Great job by the way...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants