Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset question #15

Closed
kk6398 opened this issue Aug 22, 2024 · 27 comments
Closed

Dataset question #15

kk6398 opened this issue Aug 22, 2024 · 27 comments

Comments

@kk6398
Copy link

kk6398 commented Aug 22, 2024

Hi, thanks for your excellent work.
May I ask what is the specific settings of selecting sparse view (training view and test view) for Tanks dataset in paper, such as n=3,n=6, and n=12.

@RaymondJiangkw
Copy link
Owner

Hi, it is implemented in this repo. The views are selected such that the interval between each consecutive pair is fixed. For example, for 3 views, we select the first frame, the middle frame and the last frame.

@kk6398
Copy link
Author

kk6398 commented Aug 22, 2024

Thanks for your reply. I see the corresponding code in dataset_readers.py for line187.
For example:
I try to use TT_family as the dataset, which number is totally 200. Then, we set the "num_images==3". As code runs, length =200, interval=98, train_cam_infos =cam_infos[0, 99, 198]. The "train_cam_infos[-1] = cam_infos[-1]" means that we append the last image(199) into the train_cam_infos?
And is the last code means that test_cam_infos includes the residual cam_infos (200-num_images)?

`elif eval and num_images > 0:
# Use num_images to specify train/test split
from math import floor # 返回小于参数x的最大整数,即对浮点数向下取整
length = len(cam_infos) # tt_family: 200
interval = floor((length - num_images) / (num_images - 1)) # (200-3)/(3-1)=98
train_cam_infos = [c for idx, c in enumerate(cam_infos) if idx % (interval + 1) == 0] # 0, 99, 198
train_cam_infos[-1] = cam_infos[-1] # Ensure last frame is covered # 0, 99, 198, 199 ???
assert len(train_cam_infos) == num_images
train_cam_image_name_s = {c.image_name: 1 for c in train_cam_infos}
test_cam_infos = [c for idx, c in enumerate(cam_infos) if not (c.image_name in train_cam_image_name_s)] # residual cam_infos

@RaymondJiangkw
Copy link
Owner

Hi, for an array, assigning the value to its last index does not mean append. It means the replacement.
The last code means that all remaining views are used for testing.

@kk6398
Copy link
Author

kk6398 commented Aug 22, 2024

Hi, for an array, assigning the value to its last index does not mean append. It means the replacement. The last code means that all remaining views are used for testing.

Amazing, I discovery the other work like FSGS, Instantsplat, that chooses the specific number test view(like 12). I have to admire your excellent work again.
So, the experiment results of other methods in table 1 of paper are under the same experimental setting? I mean the same number of test view.

@RaymondJiangkw
Copy link
Owner

Yes, they are tested in the same evaluation protocol.

@kk6398
Copy link
Author

kk6398 commented Aug 22, 2024

Thank you a lot.
One more question. I'm curious how many frames you used to initialize the point cloud.
Since I see that the construct_coarse_solution() function in train.py is a bit long and complex, can you simply describe the approximate or specific number of frames used to initialize the point cloud? I will read the function carefully later.

@RaymondJiangkw
Copy link
Owner

We only use the training frames when constructing the coarse solution.

@kk6398
Copy link
Author

kk6398 commented Aug 27, 2024

We only use the training frames when constructing the coarse solution.

Hi,
As I understand it, please judge which step I have problems with:

  1. We need to take all frames as input.
  2. However, in the construct_coarse_solution() stage, only sparse view(3/6/12) is used to initialize point clouds and camera parameters, and opt.align_steps(400) iteration is used to optimize pose and align depth.
  3. In refinement stage, the camera parameters and point cloud returned by construct_coarse_solution() are used to perform gs render. Only geometric parameters are optimized. And only training view is used in refinement.
  4. The rendering of the test view is covered only in the render.py.

Looking forward your reply sincerely.

@RaymondJiangkw
Copy link
Owner

Hi, yes, and actually you don't need to pass in all the frames into the training. That's only because the training and evaluation share the same data loader. If you only have 3 or more views and don't want to evaluate the performance, it's also fine to pass in them only into the training.
For more details, it is recommended to read the README.md and the paper. :) If you want to evaluate the performance on the testing views, it's also needed to register the camera positions of testing views.

@kk6398
Copy link
Author

kk6398 commented Aug 28, 2024

Hi, yes, and actually you don't need to pass in all the frames into the training. That's only because the training and evaluation share the same data loader. If you only have 3 or more views and don't want to evaluate the performance, it's also fine to pass in them only into the training. For more details, it is recommended to read the README.md and the paper. :) If you want to evaluate the performance on the testing views, it's also needed to register the camera positions of testing views.

Thank you a lot.
To add: ①. the testing views are rendered in training_report() of refinement stage, although the cameras of test views are not registered yet. Is right?
②. In addition, we need calculate and register the cameras of testing views before rendering the testing views. Is right? Because we only calculate and register the cameras of training views at the training stage(construct_coarse_solution())
③. As for the question of passing training frames or all frames, I notice that the 3dgs of train.py need to add the param "--eval". Afterwards, the "eval = True" in the "cfg_args.txt" causes that the testing view to be detected during render.py
So, we need to change the code in dataset_readers.py and change the dataset folder if we want passing the training frames? It looks like wired. lol....

@RaymondJiangkw
Copy link
Owner

Hi, 1. No. 2. Yes. 3. I don’t quite understand what u mean. The split of training and testing views are enabled through adding the eval flag, which is consistent with the original 3DGS implementation.

@kk6398
Copy link
Author

kk6398 commented Aug 28, 2024

Oh.... sorry, I didn't read the code clearly.

I got it.
In conclusion, we only register and optimizer the training view poses at training stage through corresponing loss, rgb loss and depth loss. Then register and optimizer the testing view poses at the post-training stage(i.e. eval.py) using the rgb loss only.

@RaymondJiangkw
Copy link
Owner

Notice that, in ur so-called 'post-training' stage, only the extrinsics are optimized.

@kk6398
Copy link
Author

kk6398 commented Aug 28, 2024

Okayokay,hhh! thank you for your patience. As for registration of testing view, "The camera pose of the next unregistered testing view is initialized with the corresponding value for the last registered testing view." as describe in paper.
But how about aligning for testing view? Only finised by rgb loss only?

@RaymondJiangkw
Copy link
Owner

There is no alignment for testing views. Testing views are only for testing and we only want to know their camera postions.

@kk6398
Copy link
Author

kk6398 commented Aug 28, 2024

There is no alignment for testing views. Testing views are only for testing and we only want to know their camera postions.

① For off-the-shelf methods: just only initial the testing view poses through sfm without BA, and then don't we need to align the testing view pose with the training view pose?

② For do not rely on off-the-shelf methods: "post-training optimization of the camera poses of the testing views, based on the RGB loss only." Don't we need to align the testing view pose with the training view pose? When we render.py, don't we need to render the testing view according to the point cloud of the training view? I think the the testing pose and training pose are not in the same coordinate.

Looking forward your reply sincerely.

@RaymondJiangkw
Copy link
Owner

There is no alignment between testing pose and training pose. There is only registration of testing pose. Training view doesn’t possess the point cloud.

@kk6398
Copy link
Author

kk6398 commented Aug 29, 2024

There is no alignment between testing pose and training pose. There is only registration of testing pose. Training view doesn’t possess the point cloud.

Sorry,

  1. In train.py/Line372-376 of refinement() in train.py, we save the point_cloud.ply according the iteration when we run the train.py. This process is same as oringinal 3dgs.
  2. Then in eval.py/Line74 and scene/init.py/Line64-68, we load the point_cloud.ply from the train.py outputs, and the gaussian parameters are passed. In order to execute eval.py/Line119 test_out = renderApprSurface(test_view, gaussians, pipe, background) when we run the eval.py.
  3. As for alignment between training pose and testing pose, shouldn't the training pose be in the same coordinate system as the testing pose. Because in my opinion, the training poses are in the coordinate under the training view, but the testing poses are in the coordinate under the testing view, which are in different coordinate.
  4. As for initalization of testing pose in render.py. ①For methods that rely on off-the-shelf estimated camera poses, we use the sfm without BA to output cameras file, read and opimize it; ②For methods that do not rely on off-the-shelf estimated camera poses, we initial the cameras as "qvec=np.array([1., 0., 0., 0.]), tvec=np.array([0., 0., 0.])" and optimize it by rgb loss.
  5. In render.py/Line41, did we need to load the .ply file from train.py? This process is same as oringinal 3dgs.

Thank you. I thought about these questions very carefully. I hope you can point out where is wrong and solve my confusion. Looking forward to your reply sincerely.

@RaymondJiangkw
Copy link
Owner

Be careful with terms. Alignment refers to something specific in this work. I believe your understanding is fine, but your description is not rigorous.

@kk6398
Copy link
Author

kk6398 commented Aug 30, 2024

The purpose of the original 3dgs that colmap all images is to put the cameras of all images under a unified coordinate system. And for the few-shot work where our training and testing view are separate, don't we need to put them in a unified coordinate?

Reference: (NVlabs/InstantSplat#11)
image

@RaymondJiangkw
Copy link
Owner

Hi, you are correct. The testing views need to be separately registered after the training. But you should not call it ‘alignment’ to avoid confusion because it refers to something else in this work.

@kk6398
Copy link
Author

kk6398 commented Aug 30, 2024

Hi, you are correct. The testing views need to be separately registered after the training. But you should not call it ‘alignment’ to avoid confusion because it refers to something else in this work.

Thank you for your correction.
What specific method did we use to put the training view pose and the test view pose in a unified coordinate?

@RaymondJiangkw
Copy link
Owner

Check out 'eval.py' and the supplementary of the paper.

@kk6398
Copy link
Author

kk6398 commented Sep 6, 2024

sorry, I find a discrepancy between the code and the supplementary description:
In supplementary section 2.2: "For our method, we optimize quaternion with a learning rate of 0.001 and translation with a learning rate of 0.01for the testing views based on the RGB loss only."
However in code Line117-132(especially Line131-132 of eval.py :

 loss = torch.nn.L1Loss()(test_view.image, test_out["render"].clamp(0, 1)) + \
                1e2 * torch.nn.L1Loss()(xy0, xy1)

This includes not only rgb loss, but also corresponding loss.
Looking forward your reply.

@RaymondJiangkw
Copy link
Owner

Yeah, you can remove the correspondence loss there. It should make little difference. I cleaned the original messy implementation by rewriting most parts. I think I added it there at that time because it helps the metrics in some cases when doing the verification.

@RaymondJiangkw
Copy link
Owner

If you have further questions, you are welcome to send your contact to my email (can be found in the paper). I will contact you through that.

@kk6398
Copy link
Author

kk6398 commented Sep 6, 2024

If you have further questions, you are welcome to send your contact to my email (can be found in the paper). I will contact you through that.

Thanks a lot. Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants