Vision-Informed Semantic Tagging and Annotator (ViSTA)

This repository is primarily aimed at processing undigitized images from the Digital Repository Service (DRS) at Northeastern University. This project leverages multi-model VLM models (Gemini Pro 1.5, Claude Sonnet/Opus) to tag images with key metadata (ie: titles, abstracts, and subjects), contributing to Northeastern's Digital Repository Archives.

This repository is open for use by anyone interested in metadata tagging and annotation, including but not limited to libraries, archives, and other literary organizations.

The results from all of the research done to implement this system is linked below

LINK TO LLM TESTING SPREADSHEET

LINK TO PROJECT REPORT

LINK TO SYSTEM DESIGN PROTOTYPE V1

How It Works

Image Pre-Processing The system pre-processes images and converts them to .jpeg format for VLM use, adjusting quality to optimize for API Image upload constraints.
Transcription: Some collections within the DRS (ie: Boston Globe) contain an additional side to each of the digitally stored photographs that possesses additional textual context about the photograph itself. ViSTA is capable of transcribing text off of the additional image to extract valuable context to support further metadata generation.
Title and Abstract Generation: The script generates descriptive titles and abstracts for the image based on its content as well as additional context.
Tagging The metadata generated is tagged to each image and exported into a given csv file, or can be exported into JSON format.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.idea		.idea
ViSTA		ViSTA
tests		tests
.gitignore		.gitignore
A\|B_testing.md		A\|B_testing.md
Claude_Protoype_Report.md		Claude_Protoype_Report.md
Gemini_Prototype_Report.md		Gemini_Prototype_Report.md
LICENSE		LICENSE
README.md		README.md
Scrap.txt		Scrap.txt
System_Design_(ROUGHDRAFT).pdf		System_Design_(ROUGHDRAFT).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Informed Semantic Tagging and Annotator (ViSTA)

How It Works

About

Releases

Packages

Contributors 2

Languages

License

shoumik123majumdar/ViSTA

Folders and files

Latest commit

History

Repository files navigation

Vision-Informed Semantic Tagging and Annotator (ViSTA)

How It Works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages