Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset question #11

Open
kk6398 opened this issue Aug 21, 2024 · 7 comments
Open

Dataset question #11

kk6398 opened this issue Aug 21, 2024 · 7 comments

Comments

@kk6398
Copy link

kk6398 commented Aug 21, 2024

Hi, thanks for your excellent work.
For the TT dataset, when the training view is 12, the test view is the remaining 12.
When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

@kk6398
Copy link
Author

kk6398 commented Aug 22, 2024

Hi, thanks for your excellent work. For the TT dataset, when the training view is 12, the test view is the remaining 12. When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

Sorry, I have figure out as decscribed in paper.
However, the test image is uniformly sampled from 22 images excluding the first and last one, which makes 11 images, right?
In addition, how can I deal it in code when n=3 or 6. Because I discovery there is only 3 training view in "...\InstantSplat\data\TT\Family\3_views\images". How can I get the corresponing test view?

@kairunwen
Copy link

Hi, when the training view n=3/6/12, test views n=12.
You can get the initial test view pose here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L85-L141
Take training view number = 3 as an example:
(1) input 3 train imgs to dust3r --> get 3 pointcloud (defined as train_pcd)
(2) input 3 train imgs and 12 test imgs to dust3r --> get 15 pointcloud (3 pcd1 & 12 pcd2) and 15 pose (3 pose1 & 12 pose2)
(3) use 3 train_pcd & 3 pcd1 to apply pointcloud registration and calculate transform_matrix --> get transform_matrix M
(4) use transform_matrix M to transform 12 pose2 into 12 test_pose --> get initial test_pose = 12 test_pose
(5) optimize test_pose to achieve a more precise alignment for evaluation: https://github.com/NVlabs/InstantSplat/blob/main/render.py#L45

@kk6398
Copy link
Author

kk6398 commented Aug 27, 2024

Thank you for your reply.
Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?
So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

@kairunwen
Copy link

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64
and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

@kk6398
Copy link
Author

kk6398 commented Aug 27, 2024

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64 and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3.
As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

@kairunwen
Copy link

kairunwen commented Sep 11, 2024

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3.

No.

As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

Train view idx = (0 3 5 7 9 11 13 15 17 19 21 23)
Test view idx = (1 2 4 6 8 10 12 14 16 18 20 22)

@Master-cai
Copy link

@kairunwen Hi! I think there is a discrepancy between the code and your description.
I add some code to init_test_pose.py to print the idx and img name for training and testing:

    # ---------------- (1) Prepare Train & Test images list ---------------- 
    all_img_list = sorted(os.listdir(os.path.join(img_base_path, "images")))
    if args.llffhold > 0:
        train_img_list = [c for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold != 0]
        train_img_idx = [idx for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold != 0]
        test_img_list = [c for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold == 0]
        test_img_idx = [idx for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold == 0]
    # sample sparse view
    indices = np.linspace(0, len(train_img_list) - 1, n_views, dtype=int)
    print(indices)
    print(f"trn idx {train_img_idx}, name {train_img_list}")
    print(f"tst idx {test_img_idx}, name {test_img_list}")

And the result is:

trn idx [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22], name ['000291.jpg', '000301.jpg', '000312.jpg', '000322.jpg', '000332.jpg', '000343.jpg', '000353.jpg', '000363.jpg', '000374.jpg', '000384.jpg', '000394.jpg', '000405.jpg']
tst idx [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23], name ['000296.jpg', '000307.jpg', '000317.jpg', '000327.jpg', '000338.jpg', '000348.jpg', '000358.jpg', '000369.jpg', '000379.jpg', '000389.jpg', '000400.jpg', '000410.jpg']

But you stated:

Train view idx = (0 3 5 7 9 11 13 15 17 19 21 23)
Test view idx = (1 2 4 6 8 10 12 14 16 18 20 22)

I'm confused. Is that any thing wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants