Visual Reconstruction with Latent Diffusion through Linear Mapping

Check out the preprint: Image Reconstruction from Electroencephalography Using Latent Diffusion Example reconstructions for subject 1. Reconstructions with pairwise correlation from best to worst for the final CLIP embedding. Each row of images with blue frames are the ground truth images. Each row of images with green frames directly under the blue-framed images correspond to the reconstruction of those images.

UMAP Mapping

UMAP of Final CLIP Embeddings for ground truth (blue) and reconstructed images (green) from subject 1. The transparency level as well as the size of the green images indicate the correlation of CLIP vector between the corresponding reconstructed image and ground truth image pair. The ground truth images themselves form two clusters of images: animals and food, which reflects the 2 most prominent clusters in the reconstructed images as well.

Feature transfer through narrow time segment swapping

Examples of data segment swapping. Each pair of rows represents the 2 images that have parts of the EEG swapped. The images for each pair of rows from top to bottom are: "gorilla_18s.jpg" and "gopher_09s.jpg"; "chaps_18s.jpg" and "headscarf_03s.jpg"; "cat_01b.jpg" and "coverall_06s.jpg"; "sausage_04s.jpg" and "piglet_02s.jpg"; "caterpillar_03s.jpg" and "possum_05s.jpg"; "cart_09s.jpg" and "elephant_11n.jpg". Each image in a row represents the result of swapping a time window of 50ms (5 samples). The next image is the result of moving the time window by 10ms (1 sample). The last image of each row is added as a reference since it does not have any swapping.

Performance

EEG visual reconstruction

This section covers the visual reconstruction using the THINGS-EEG2 dataset

Getting started

For mac and linux:

Follow instructions from brain-diffusor to create the python environment
Note: please make sure tokenizers==0.12.1 and transformers==4.19.2. For the diffusion environment, you may use requirement.txt

For mac and linux:

virtualenv pyenv --python=3.10.12
source pyenv/bin/activate
pip install -r requirements.txt

For Windows:

virtualenv pyenv --python=3.10.12
pyenv\Scripts\activate
pip install -r requirements.txt

Download preprocessed eeg data, unzip "sub01", "sub02", etc under data/thingseeg2_preproc.

cd data/
wget https://files.de-1.osf.io/v1/resources/anp5v/providers/osfstorage/?zip=
mv index.html?zip= thingseeg2_preproc.zip
unzip thingseeg2_preproc.zip -d thingseeg2_preproc
cd thingseeg2_preproc/
unzip sub-01.zip
unzip sub-02.zip
unzip sub-03.zip
unzip sub-04.zip
unzip sub-05.zip
unzip sub-06.zip
unzip sub-07.zip
unzip sub-08.zip
unzip sub-09.zip
unzip sub-10.zip
cd ../../
python thingseeg2_data_preparation_scripts/prepare_thingseeg2_data.py

Download ground truth images, unzip "training_images", "test_images" under data/thingseeg2_metadata

cd data/
wget https://files.de-1.osf.io/v1/resources/y63gw/providers/osfstorage/?zip=
mv index.html?zip= thingseeg2_metadata.zip
unzip thingseeg2_metadata.zip -d thingseeg2_metadata
cd thingseeg2_metadata/
unzip training_images.zip
unzip test_images.zip
cd ../../
python thingseeg2_data_preparation_scripts/save_thingseeg2_images.py
python thingseeg2_data_preparation_scripts/save_thingseeg2_concepts.py

Download VDVAE and Versatile Diffusion weights

cd vdvae/model/
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-opt.th
cd ../../versatile_diffusion/pretrained/
wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/vd-four-flow-v1-0-fp16-deprecated.pth
wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/kl-f8.pth
wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/optimus-vae.pth
cd ../../

Extract train and test latent embeddings from images and text labels

python thingseeg2_data_preparation_scripts/vdvae_extract_features.py 
python thingseeg2_data_preparation_scripts/clipvision_extract_features.py 
python thingseeg2_data_preparation_scripts/cliptext_extract_features.py 
python thingseeg2_data_preparation_scripts/evaluation_extract_features_from_test_images.py

For Windows:

Follow instructions from brain-diffusor to create the python environment
Note: please make sure tokenizers==0.12.1 and transformers==4.19.2. For the diffusion environment, you may use requirement.txt

virtualenv pyenv --python=3.10.12
pyenv\Scripts\activate
pip install -r requirements-win.txt

Download preprocessed eeg data, unzip "sub01", "sub02", etc under data/thingseeg2_preproc.

create a folder called thingseeg2_preproc

copy and paste the content of osfstorage-archive.zip into thingseeg2_preproc

navigate to thingseeg2_preproc, unzip each zip files one by one

open terminal, navigate to project root directory, and run this command

python thingseeg2_data_preparation_scripts/prepare_thingseeg2_data.py

Download ground truth images, unzip "training_images", "test_images" under data/thingseeg2_metadata

create a folder called thingseeg2_metadata

copy and paste the content of osfstorage-archive (1).zip into thingseeg2_metadata

navigate to thingseeg2_metadata, unzip training_images.zip and test_images.zip

open terminal, navigate to project root directory, and run these commands

python thingseeg2_data_preparation_scripts/save_thingseeg2_images.py
python thingseeg2_data_preparation_scripts/save_thingseeg2_concepts.py

Download VDVAE and Versatile Diffusion weights

imagenet64-iter-1600000-log.jsonl
imagenet64-iter-1600000-model.th
imagenet64-iter-1600000-model-ema.th
imagenet64-iter-1600000-opt.th
vd-four-flow-v1-0-fp16-deprecated.pth
kl-f8.pth
optimus-vae.pth
Navigate into vdvae/model/, move imagenet64-iter-1600000-log.jsonl, imagenet64-iter-1600000-model.th, imagenet64-iter-1600000-model-ema.th, and imagenet64-iter-1600000-opt.th here

Navigate into versatile_diffusion/pretrained/, move vd-four-flow-v1-0-fp16-deprecated.pth, kl-f8.pth, and optimus-vae.pth here

Extract train and test latent embeddings from images and text labels. Run these commands from the project root directory

python thingseeg2_data_preparation_scripts/vdvae_extract_features.py 
python thingseeg2_data_preparation_scripts/clipvision_extract_features.py 
python thingseeg2_data_preparation_scripts/cliptext_extract_features.py 
python thingseeg2_data_preparation_scripts/evaluation_extract_features_from_test_images.py

Training and reconstruction

python thingseeg2_scripts/train_regression.py 
python thingseeg2_scripts/reconstruct_from_embeddings.py 
python thingseeg2_scripts/evaluate_reconstruction.py 
python thingseeg2_scripts/plot_reconstructions.py -ordered True
python thingseeg2_scripts/plot_umap_CLIP.py

Reproducing figures

The reconstruction script assumes you have 7 GPUs, remove parallelism and set all GPUs to 0 if you only have 1 GPU.\

Reproducing results/thingseeg2_preproc/fig_performance.png:

thingseeg2_figure_scripts/train_all_subjects.sh
thingseeg2_figure_scripts/reconstruct_all_subjects.sh
thingseeg2_figure_scripts/evaluate_all_subjects.sh
python thingseeg2_figure_scripts/fig_performance.py

Reproducing results/thingseeg2_preproc/fig_across_duration.png:

thingseeg2_figure_scripts/train_across_duration.sh
thingseeg2_figure_scripts/reconstruct_across_duration.sh
thingseeg2_figure_scripts/evaluate_across_duration.sh
python thingseeg2_figure_scripts/fig_across_durations.py

Reproducing results/thingseeg2_preproc/fig_ablations.png (assuming you have completed fig_performance.png):

thingseeg2_figure_scripts/reconstruct_ablation.sh
thingseeg2_figure_scripts/evaluate_ablation.sh
python thingseeg2_figure_scripts/fig_ablations.py

Reproducing results/thingseeg2_preproc/fig_CLIP_across_size_num_avg.png:

thingseeg2_figure_scripts/train_across_size_num_avg.sh
thingseeg2_figure_scripts/reconstruct_across_size_num_avg.sh
thingseeg2_figure_scripts/evaluate_across_size_num_avg.sh
python thingseeg2_figure_scripts/fig_across_size_num_avg.py

MEG visual reconstruction

This section covers the visual reconstruction using the THINGS-MEG dataset

Getting started

Follow instructions from brainmagick and brain-diffusor to create the python environments for both
Note: please make sure tokenizers==0.12.1 and transformers==4.19.2

Download the THINGS-Images, then save the images and categories as numpy files:

source diffusion/bin/activate
python save_things_images.py
python save_things_categories.py

Preprocess the MEG data and prepare the stimuli:

conda activate bm
python preprocess_meg.py
python preprocess_meg_epoching.py
python get_stims1b.py

(optional) Get the captions for the images:

conda activate lavis
python generate_captions1b.py

Create the training embeddings from the stimulus

source diffusion/bin/activate
python vdvae_extract_features1b.py
python cliptext_extract_features.py
python clipvision_extract_features.py

First Stage Reconstruction with VDVAE

Download pretrained VDVAE model files and put them in vdvae/model/ folder

wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-opt.th

Extract VDVAE latent features of stimuli images, train regression models from MEG to VDVAE latent features and save test predictions for individual test trials as well as averaged test trials:

source diffusion/bin/activate
python vdvae_regression1b.py
python vdvae_reconstruct_images1b.py

Second Stage Reconstruction with Versatile Diffusion

Download pretrained Versatile Diffusion model "vd-four-flow-v1-0-fp16-deprecated.pth", "kl-f8.pth" and "optimus-vae.pth" from HuggingFace and put them in versatile_diffusion/pretrained/ folder

Train regression models from MEG to CLIP-Text features and save test predictions by running python cliptext1b_regression_alltokens.py
TODO: make regression for image captions

Train regression models from MEG to CLIP-Vision features and save test predictions by running python clipvision1b_regression.py
Reconstruct images from predicted test features using python versatilediffusion_reconstruct_images1b.py

Averaged Test Trials Reconstruction

Save averaged test predictions:

python avg1b_regression_prediction.py

First Stage Reconstruction with VDVAE:

python avg1b_vdvae_reconstruct_images1b.py

Second Stage Reconstruction with Versatile Diffusion:

python avg1b_versatilediffusion_reconstruct_images1b.py

Citations

Ozcelik, F., & VanRullen, R. (2023). Natural scene reconstruction from fMRI signals using generative latent diffusion. Scientific Reports, 13(1), 15666. https://doi.org/10.1038/s41598-023-42891-8

Gifford, A. T., Dwivedi, K., Roig, G., & Cichy, R. M. (2022). A large and rich EEG dataset for modeling human visual object recognition. NeuroImage, 264, 119754. https://doi.org/10.1016/j.neuroimage.2022.119754

Benchetrit, Y., Banville, H., & King, J.-R. (n.d.). BRAIN DECODING: TOWARD REAL-TIME RECONSTRUCTION OF VISUAL PERCEPTION.

Hebart, M. N., Contier, O., Teichmann, L., Rockter, A. H., Zheng, C. Y., Kidder, A., Corriveau, A., Vaziri-Pashkam, M., & Baker, C. I. (2023). THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife, 12, e82580. https://doi.org/10.7554/eLife.82580

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Reconstruction with Latent Diffusion through Linear Mapping

UMAP Mapping

Feature transfer through narrow time segment swapping

Performance

EEG visual reconstruction

Getting started

For mac and linux:

For Windows:

Training and reconstruction

Reproducing figures

MEG visual reconstruction

Getting started

Create the training embeddings from the stimulus

First Stage Reconstruction with VDVAE

Second Stage Reconstruction with Versatile Diffusion

Averaged Test Trials Reconstruction

Citations

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
figures		figures
results/thingseeg2_preproc		results/thingseeg2_preproc
thingseeg2_data_preparation_scripts		thingseeg2_data_preparation_scripts
thingseeg2_figure_scripts		thingseeg2_figure_scripts
thingseeg2_scripts		thingseeg2_scripts
thingseeg2_transfer_learning_scripts		thingseeg2_transfer_learning_scripts
thingsmeg_scripts		thingsmeg_scripts
vdvae		vdvae
versatile_diffusion		versatile_diffusion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-win.txt		requirements-win.txt
requirements.txt		requirements.txt

License

desa-lab/Perceptogram

Folders and files

Latest commit

History

Repository files navigation

Visual Reconstruction with Latent Diffusion through Linear Mapping

UMAP Mapping

Feature transfer through narrow time segment swapping

Performance

EEG visual reconstruction

Getting started

For mac and linux:

For Windows:

Training and reconstruction

Reproducing figures

MEG visual reconstruction

Getting started

Create the training embeddings from the stimulus

First Stage Reconstruction with VDVAE

Second Stage Reconstruction with Versatile Diffusion

Averaged Test Trials Reconstruction

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages