Skip to content

Commit

Permalink
Summary
Browse files Browse the repository at this point in the history
  • Loading branch information
DmitryRyumin committed Mar 21, 2024
1 parent dc14a30 commit 020eb51
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ ICASSP 2024 Papers: A complete collection of influential and exciting research p
</tr>
<tr>
<td>
Vision and language
Vision and Language
</td>
<td colspan="4" rowspan="201" align="center"><i>Will soon be added</i></td>
</tr>
Expand Down
45 changes: 45 additions & 0 deletions sections/2024/main/IVMSP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# ICASSP-2024-Papers

<table>
<tr>
<td><strong>Application</strong></td>
<td>
<a href="https://huggingface.co/spaces/DmitryRyumin/NewEraAI-Papers" style="float:left;">
<img src="https://img.shields.io/badge/🤗-NewEraAI--Papers-FFD21F.svg" alt="App" />
</a>
</td>
</tr>
<tr>
<td><strong>Previous Collections</strong></td>
<td>
<a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/blob/main/README_2023.md">
<img src="http://img.shields.io/badge/ICASSP-2023-0073AE.svg" alt="Conference">
</a>
</td>
</tr>
</table>

<div align="center">
<a href="https://github.com/DmitryRyumin/ICASSP-2023-24/blob/main/sections/2024/main/MMSP.md">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/left.svg" width="40" alt="" />
</a>
<a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/home.svg" width="40" alt="" />
</a>
<a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/blob/main/sections/2024/main/AASP.md">
<img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/right.svg" width="40" alt="" />
</a>
</div>

## Vision and Language

![Section Papers](https://img.shields.io/badge/Section%20Papers-soon-42BA16) ![Preprint Papers](https://img.shields.io/badge/Preprint%20Papers-soon-b31b1b) ![Papers with Open Code](https://img.shields.io/badge/Papers%20with%20Open%20Code-soon-1D7FBF) ![Papers with Video](https://img.shields.io/badge/Papers%20with%20Video-soon-FF0000)

| **Title** | **Repo** | **Paper** | **Video** |
|-----------|:--------:|:---------:|:---------:|
| JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval | [![GitHub](https://img.shields.io/github/stars/DannielGe/JM-CLIP?style=flat)](https://github.com/DannielGe/JM-CLIP) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446490-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446490) | :heavy_minus_sign: |
| Language-Free Compositional Action Generation via Decoupling Refinement | [![GitHub](https://img.shields.io/github/stars/XLiu443/Language-free-Compositional-Action-Generation-via-Decoupling-Refinement?style=flat)](https://github.com/XLiu443/Language-free-Compositional-Action-Generation-via-Decoupling-Refinement) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448207-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448207) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.03538-b31b1b.svg)](https://arxiv.org/abs/2307.03538) | :heavy_minus_sign: |
| DAP: Domain-Aware Prompt Learning for Vision-and-Language Navigation | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446504-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446504) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2311.17812-b31b1b.svg)](https://arxiv.org/abs/2311.17812) | :heavy_minus_sign: |
| M3sum: A Novel Unsupervised Language-Guided Video Summarization | [![GitHub](https://img.shields.io/github/stars/ZovanZhou/M3Sum?style=flat)](https://github.com/ZovanZhou/M3Sum) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447504-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447504) | :heavy_minus_sign: |
| WAVER: Writing-Style Agnostic Text-Video Retrieval via Distilling Vision-Language Models through Open-Vocabulary Knowledge | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446193-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446193) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.09507-b31b1b.svg)](https://arxiv.org/abs/2312.09507) | :heavy_minus_sign: |
| MTIDNet: A Multimodal Temporal Interest Detection Network for Video Summarization | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448236-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448236) | :heavy_minus_sign: |
12 changes: 6 additions & 6 deletions sections/2024/main/MMSP.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@

| **Title** | **Repo** | **Paper** | **Video** |
|-----------|:--------:|:---------:|:---------:|
| Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://smil-spcras.github.io/DAVIS/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448048) | :heavy_minus_sign: |
| The Multimodal Information based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mispchallenge.github.io/mispchallenge2023/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447462) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.08348-b31b1b.svg)](https://arxiv.org/abs/2309.08348) | :heavy_minus_sign: |
| Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447487) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.08850-b31b1b.svg)](https://arxiv.org/abs/2312.08850) | :heavy_minus_sign: |
| TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448124) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.12306-b31b1b.svg)](https://arxiv.org/abs/2309.12306) | :heavy_minus_sign: |
| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446769) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03424-b31b1b.svg)](https://arxiv.org/abs/2401.03424) | :heavy_minus_sign: |
| GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446460) | :heavy_minus_sign: |
| Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://smil-spcras.github.io/DAVIS/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448048-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448048) | :heavy_minus_sign: |
| The Multimodal Information based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mispchallenge.github.io/mispchallenge2023/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447462-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447462) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.08348-b31b1b.svg)](https://arxiv.org/abs/2309.08348) | :heavy_minus_sign: |
| Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447487-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447487) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.08850-b31b1b.svg)](https://arxiv.org/abs/2312.08850) | :heavy_minus_sign: |
| TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448124-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448124) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.12306-b31b1b.svg)](https://arxiv.org/abs/2309.12306) | :heavy_minus_sign: |
| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446769-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446769) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03424-b31b1b.svg)](https://arxiv.org/abs/2401.03424) | :heavy_minus_sign: |
| GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446460-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446460) | :heavy_minus_sign: |

0 comments on commit 020eb51

Please sign in to comment.