Multimodal language models for GalaxyZoo image interpretation

Rationale

The rationale of this project is to leverage existing Large Multi-Modal Models (LMMs) to engage meaningfully with astronomical images. The overarching goal is to build a fine-tuned Language and Vision Model such as LlaVA on a curated dataset from the Galaxy Zoo project.

You can see examples of chat here:

https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/talk/1270

The steps of the project are as follows:

Explore the Galaxy Zoo Talk dataset
Read and understand the high-level details of the LlaVA and Llava-Med papers.
Summarise the text using a LLM using either open-source or proprietary models.
Curate the image - summary pairs for the instruction-tuning.
Fine-tune the model.
Evaluate the model.

The architecture of the LlaVA model, where the pre-trained CLIP visual encoder ViT-L/14 is connected to the LLAMA decoder.

You can watch the hack presentation by Jo during the telecon.

There is also a good video describing MLMs here: https://www.youtube.com/watch?v=mkI7EPD1vp8

Dataset

References

Here is a list of references to get started on the subject

LLaVA paper
LLaVA demo

LLM-specific resources:

HuggingFace NLP course: https://huggingface.co/learn/nlp-course/chapter0/1?fw=pt (good for references; understanding the main parts of an NLP pipeline, tokenizer, embeddingings, downstream tasks)
HuggingFace Transformers (https://huggingface.co/docs/transformers/index)
Langchain tutorials, e.g. how to summarise https://python.langchain.com/docs/modules/chains/popular/summarize.html
OpenAI cookbook: https://github.com/openai/openai-cookbook. This example shows you can you can summarise a paper, for example: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_for_knowledge_retrieval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multimodal language models for GalaxyZoo image interpretation

Rationale

Dataset

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multimodal language models for GalaxyZoo image interpretation

Rationale

Dataset

References