What shape should data be? #22

GlKz13 · 2024-09-25T08:11:28Z

Hello! Thank you for your model!
Can you clarify me one more thing?
In your forward method the following is written:

"""Forward function of EvTexture

    Args:
        imgs: Input frames with shape (b, n, c, h, w). b is batch size. n is the number of frames, and c equals 3 (RGB channels).
        voxels_f: forward event voxel grids with shape (b, n-1, Bins, h, w). n-1 is intervals between n frames.
        voxels_b: backward event voxel grids with shape (b, n-1, Bins, h, w).

    Output:
        out_l: output frames with shape (b, n, c, 4h, 4w)
    """

Can you explain me how I should organize my data in, for example, calendar.h5 to feed the model?
I mean in calendar.h5 there are "images" ([H, W]) and "voxels" ([Bins, H, W])
I tried to take 2 images, stacked them ( torch.stack([ image1, image2] )
then I took voxels between these 2 images ( that is one f_voxel and one backward voxel )
then unsqueeze everything to get this one batch( "b" in the forward function )
Finally we get this shapes:
images: [1, 2, 3, H, W]
voxels: [1, 1, 5, H, W]
I tried to use the model: forward( images, voxels_f, voxels_b)

I really got an upscaled image but with awful quality
so, what did I do wrong, I used test data published in this repo. I understand that I probably did smth wrong with shapes or wrongly organized the data
BUT how exactly should I use h5 files with forward method? I want to know how to use "forward" method manually
Thank you!

The text was updated successfully, but these errors were encountered:

GlKz13 · 2024-09-25T08:18:52Z

`Here is my code by the way:

with h5.File("preproccessed/events/Vid4_h5/LRx4/test/calendar.h5", "r") as h:
print("All frames ", len(list(h["images"])))
print(h.keys())
#print(list(h["voxels_b"].keys()))
print(list(h['images']))
# take 2 images
image1 = h['images']['000000']
image2 = h['images']['000001']
image1 = np.array(image1)
image2 = np.array(image2)
# take voxels between them
vf = np.array(h['voxels_f']['000000'])
vb = np.array(h['voxels_b']['000000'])

# stack them to get n = 2
device = "cuda"
image1 = torch.tensor(image1).to(torch.float32).cuda().permute(2, 0, 1)
image2 = torch.tensor(image2).to(torch.float32).cuda().permute(2, 0, 1)
images = torch.stack([image1, image2]).unsqueeze(0)

vf = torch.tensor(vf).to(torch.float32).unsqueeze(0).unsqueeze(0)
vb = torch.tensor(vb).to(torch.float32).unsqueeze(0).unsqueeze(0)

device = "cuda"
model = EvTexture()
model_path = 'experiments/pretrained_models/EvTexture_Vimeo90K_BIx4.pth'
weights = torch.load(model_path, map_location=device)
model.load_state_dict(weights["params"])

model = model.to(device)
images = images.to(device)
vf = vf.to(device)
vb = vb.to(device)

model.eval()
with torch.inference_mode():
    res = model(images, vf, vb)

# res shape: (1, 2, 3, 576, 704)

`

DachunKai · 2024-09-25T09:15:51Z

Thank you for your interesting question about using only two frames as input and obtaining high-resolution output frames. Based on the shapes you've mentioned, they seem correct:

Images: ([1, 2, 3, H, W])
voxels_f: ([1, 1, 5, H, W])
voxels_b: ([1, 1, 5, H, W])

However, I have a question: Have you successfully tested the script ./scripts/dist_test.sh [num_gpus] options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml and obtained the results posted in the release?

I can suggest a simple way for you to quickly test it. You just need to modify the meta_info_file in the config file (link), specifically basicsr/data/meta_info/meta_info_Vid4_h5_test.txt, to replace its content with calendar.h5 2. After that, run the test script options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml, which will only test the first two images of the calendar and output the results.

I tested this and received the following results:

for 000000.png, the PSNR is 23.64, and for 000001.png, it is approximately 23.60. The PSNR results in our release for the calendar images 000000/000001 are 25.26/25.40 respectively.

I believe that inferring with only two frames leads to lower PSNR compared to using the entire video, as our model employs a recurrent structure, and two frames provide limited information, resulting in slightly poorer outcomes.

Hope this helps!

GlKz13 · 2024-09-25T11:21:43Z

Thank you, I'll try!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What shape should data be? #22

What shape should data be? #22

GlKz13 commented Sep 25, 2024 •

edited

Loading

GlKz13 commented Sep 25, 2024

DachunKai commented Sep 25, 2024

GlKz13 commented Sep 25, 2024

What shape should data be? #22

What shape should data be? #22

Comments

GlKz13 commented Sep 25, 2024 • edited Loading

GlKz13 commented Sep 25, 2024

DachunKai commented Sep 25, 2024

GlKz13 commented Sep 25, 2024

GlKz13 commented Sep 25, 2024 •

edited

Loading