Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What shape should data be? #22

Open
GlKz13 opened this issue Sep 25, 2024 · 3 comments
Open

What shape should data be? #22

GlKz13 opened this issue Sep 25, 2024 · 3 comments

Comments

@GlKz13
Copy link

GlKz13 commented Sep 25, 2024

Hello! Thank you for your model!
Can you clarify me one more thing?
In your forward method the following is written:

"""Forward function of EvTexture

    Args:
        imgs: Input frames with shape (b, n, c, h, w). b is batch size. n is the number of frames, and c equals 3 (RGB channels).
        voxels_f: forward event voxel grids with shape (b, n-1, Bins, h, w). n-1 is intervals between n frames.
        voxels_b: backward event voxel grids with shape (b, n-1, Bins, h, w).

    Output:
        out_l: output frames with shape (b, n, c, 4h, 4w)
    """

Can you explain me how I should organize my data in, for example, calendar.h5 to feed the model?
I mean in calendar.h5 there are "images" ([H, W]) and "voxels" ([Bins, H, W])
I tried to take 2 images, stacked them ( torch.stack([ image1, image2] )
then I took voxels between these 2 images ( that is one f_voxel and one backward voxel )
then unsqueeze everything to get this one batch( "b" in the forward function )
Finally we get this shapes:
images: [1, 2, 3, H, W]
voxels: [1, 1, 5, H, W]
I tried to use the model: forward( images, voxels_f, voxels_b)

I really got an upscaled image but with awful quality
so, what did I do wrong, I used test data published in this repo. I understand that I probably did smth wrong with shapes or wrongly organized the data
BUT how exactly should I use h5 files with forward method? I want to know how to use "forward" method manually
Thank you!

@GlKz13
Copy link
Author

GlKz13 commented Sep 25, 2024

`Here is my code by the way:

with h5.File("preproccessed/events/Vid4_h5/LRx4/test/calendar.h5", "r") as h:
print("All frames ", len(list(h["images"])))
print(h.keys())
#print(list(h["voxels_b"].keys()))
print(list(h['images']))
# take 2 images
image1 = h['images']['000000']
image2 = h['images']['000001']
image1 = np.array(image1)
image2 = np.array(image2)
# take voxels between them
vf = np.array(h['voxels_f']['000000'])
vb = np.array(h['voxels_b']['000000'])

# stack them to get n = 2
device = "cuda"
image1 = torch.tensor(image1).to(torch.float32).cuda().permute(2, 0, 1)
image2 = torch.tensor(image2).to(torch.float32).cuda().permute(2, 0, 1)
images = torch.stack([image1, image2]).unsqueeze(0)

vf = torch.tensor(vf).to(torch.float32).unsqueeze(0).unsqueeze(0)
vb = torch.tensor(vb).to(torch.float32).unsqueeze(0).unsqueeze(0)

device = "cuda"
model = EvTexture()
model_path = 'experiments/pretrained_models/EvTexture_Vimeo90K_BIx4.pth'
weights = torch.load(model_path, map_location=device)
model.load_state_dict(weights["params"])

model = model.to(device)
images = images.to(device)
vf = vf.to(device)
vb = vb.to(device)

model.eval()
with torch.inference_mode():
    res = model(images, vf, vb)

# res shape: (1, 2, 3, 576, 704)

`

@DachunKai
Copy link
Owner

Thank you for your interesting question about using only two frames as input and obtaining high-resolution output frames. Based on the shapes you've mentioned, they seem correct:

  • Images: ([1, 2, 3, H, W])
  • voxels_f: ([1, 1, 5, H, W])
  • voxels_b: ([1, 1, 5, H, W])

However, I have a question: Have you successfully tested the script ./scripts/dist_test.sh [num_gpus] options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml and obtained the results posted in the release?

I can suggest a simple way for you to quickly test it. You just need to modify the meta_info_file in the config file (link), specifically basicsr/data/meta_info/meta_info_Vid4_h5_test.txt, to replace its content with calendar.h5 2. After that, run the test script options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml, which will only test the first two images of the calendar and output the results.

I tested this and received the following results:
image

for 000000.png, the PSNR is 23.64, and for 000001.png, it is approximately 23.60. The PSNR results in our release for the calendar images 000000/000001 are 25.26/25.40 respectively.
image

I believe that inferring with only two frames leads to lower PSNR compared to using the entire video, as our model employs a recurrent structure, and two frames provide limited information, resulting in slightly poorer outcomes.

Hope this helps!

@GlKz13
Copy link
Author

GlKz13 commented Sep 25, 2024

Thank you, I'll try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants