Show Lab

All

74 repositories

Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•45•1k•33•1•Updated Dec 2, 2024Dec 2, 2024
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
205•3.5k•4•1•Updated Dec 2, 2024Dec 2, 2024
computer_use_ootb
Public
An out-of-the-box (OOTB) version of Anthropic Claude Computer Use for Windows and macOS
Python
•
MIT License
•76•778•7•4•Updated Dec 2, 2024Dec 2, 2024
ShowUI
Public
Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
vision-language-action gui-agents computer-use
Python
•
MIT License
•10•302•1•0•Updated Dec 2, 2024Dec 2, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
9•227•0•0•Updated Nov 30, 2024Nov 30, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
14•477•2•0•Updated Nov 30, 2024Nov 30, 2024
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
0•30•0•0•Updated Nov 28, 2024Nov 28, 2024
ROICtrl
Public
Code for ROICtrl: Boosting Instance Control for Visual Generation
0•84•0•0•Updated Nov 28, 2024Nov 28, 2024
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
12•298•0•0•Updated Nov 27, 2024Nov 27, 2024
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python
•
Apache License 2.0
•2•71•3•0•Updated Nov 27, 2024Nov 27, 2024
MovieBench
Public
0•22•0•0•Updated Nov 26, 2024Nov 26, 2024
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•62•1.1k•8•7•Updated Nov 15, 2024Nov 15, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•17•253•7•0•Updated Nov 12, 2024Nov 12, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•2•64•1•0•Updated Nov 10, 2024Nov 10, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•1•65•0•0•Updated Nov 7, 2024Nov 7, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•2•13•1•0•Updated Oct 30, 2024Oct 30, 2024
Exo2Ego-V
Public
0•7•1•0•Updated Oct 29, 2024Oct 29, 2024
watermark-steganalysis
Public
Python
•0•2•0•0•Updated Oct 24, 2024Oct 24, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•0•22•0•0•Updated Oct 22, 2024Oct 22, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•40•0•0•Updated Oct 14, 2024Oct 14, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•30•1•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•5•0•0•Updated Sep 22, 2024Sep 22, 2024
RingID
Public
Python
•0•19•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•54•852•22•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•28•241•18•0•Updated Aug 15, 2024Aug 15, 2024
X-Adapter
Public
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python
•
Apache License 2.0
•44•743•17•4•Updated Aug 14, 2024Aug 14, 2024
afformer
Public
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
deep-learning pytorch
Python
•2•40•6•0•Updated Jul 26, 2024Jul 26, 2024
cvpr2024-tutorial-video-diffusion-models
Public
HTML
•
MIT License
•0•1•0•0•Updated Jul 16, 2024Jul 16, 2024
DragAnything
Public
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Python
•15•438•20•0•Updated Jul 2, 2024Jul 2, 2024
AssistGaze
Public
Python
•0•1•0•0•Updated Jun 25, 2024Jun 25, 2024