🐮 MMORE 🤖

Massive Multimodal Open RAG & Extraction

A scalable multimodal pipeline for processing, indexing, and querying multimodal documents

Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!

Quick Start

Installation

We currently support installation through rye. Refer to the documentation for instructions on installation. The scripts/setup.sh script will install all the dependencies and install rye for you.

We also provide a docker image for easy deployment.

Usage

To launch the MMORE pipeline follow the specialised instructions in the docs.

📄 Input Documents
Upload your multimodal documents (PDFs, videos, spreadsheets, and more) into the pipeline.
🔍 Process Extracts and standardizes text, metadata, and multimedia content from diverse file formats. Easily extensible ! Add your own processors to handle new file types.
Supports fast processing for specific types.
📁 Index Organizes extracted data into a hybrid retrieval-ready Vector Store DB, combining dense and sparse indexing through Milvus. Your vector DB can also be remotely hosted and only need to provide a standard API.
🤖 RAG Use the indexed documents inside a Retrieval-Augmented Generation (RAG) system that provides a LangChain interface. Plug in any LLM with a compatible interface or add new ones through an easy-to-use interface. Supports API hosting or local inference.
🎉 Evaluation
Coming soon An easy way to evaluate the performance of your RAG system using Ragas

See the /docs directory for additional details on each modules and hands-on tutorials on parts of the pipeline.

🚧 Supported File Types

Category	File Types	Supported Device	Fast Mode
Text Documents	DOCX, MD, PPTX, XLSX, TXT	CPU	❌
PDFs	PDF	GPU/CPU	✅
Media Files	MP4, MOV, AVI, MKV, MP3, WAV, AAC	GPU/CPU	✅
Web Content (TBD)	Webpages	GPU/CPU	✅

Contributing

We welcome contributions to improve the current state of the pipeline, feel free to:

Open an issue to report a bug or ask for a new feature
Open a pull request to fix a bug or add a new feature
You can find ongoing new features and bugs in the [Issues]

Don't hesitate to star the project ⭐ if you find it interesting! (you would be our star)

License

This project is licensed under the Apache 2.0 License, see the LICENSE 🎓 file for details.

Acknowledgements

This project is part of the OpenMeditron initiative developed in LiGHT lab at EPFL/Yale/CMU Africa in collaboration with the SwissAI initiative. Thank you Scott Mahoney, Mary-Anne Hartley

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
examples		examples
resources		resources
scripts		scripts
src/mmore		src/mmore
test_data		test_data
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rag_requirements.txt		rag_requirements.txt
requirements.txt		requirements.txt
run_index.py		run_index.py
run_process.py		run_process.py
run_rag.py		run_rag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐮 MMORE 🤖

Massive Multimodal Open RAG & Extraction

Quick Start

Installation

Usage

🚧 Supported File Types

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

License

jia-shan/mmore

Folders and files

Latest commit

History

Repository files navigation

🐮 MMORE 🤖

Massive Multimodal Open RAG & Extraction

Quick Start

Installation

Usage

🚧 Supported File Types

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages