Summary

DmitryRyumin · Mar 21, 2024 · 020eb51 · 020eb51
1 parent dc14a30
commit 020eb51
Show file tree

Hide file tree

Showing 3 changed files with 52 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -256,7 +256,7 @@ ICASSP 2024 Papers: A complete collection of influential and exciting research p
         </tr>
         <tr>
             <td>
-                Vision and language
+                Vision and Language
             </td>
             <td colspan="4" rowspan="201" align="center"><i>Will soon be added</i></td>
         </tr>

diff --git a/sections/2024/main/IVMSP.md b/sections/2024/main/IVMSP.md
@@ -0,0 +1,45 @@
+# ICASSP-2024-Papers
+
+<table>
+    <tr>
+        <td><strong>Application</strong></td>
+        <td>
+            <a href="https://huggingface.co/spaces/DmitryRyumin/NewEraAI-Papers" style="float:left;">
+                <img src="https://img.shields.io/badge/🤗-NewEraAI--Papers-FFD21F.svg" alt="App" />
+            </a>
+        </td>
+    </tr>
+    <tr>
+        <td><strong>Previous Collections</strong></td>
+        <td>
+            <a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/blob/main/README_2023.md">
+                <img src="http://img.shields.io/badge/ICASSP-2023-0073AE.svg" alt="Conference">
+            </a>
+        </td>
+    </tr>
+</table>
+
+<div align="center">
+    <a href="https://github.com/DmitryRyumin/ICASSP-2023-24/blob/main/sections/2024/main/MMSP.md">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/left.svg" width="40" alt="" />
+    </a>
+    <a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/home.svg" width="40" alt="" />
+    </a>
+    <a href="https://github.com/DmitryRyumin/ICASSP-2023-24-Papers/blob/main/sections/2024/main/AASP.md">
+        <img src="https://cdn.jsdelivr.net/gh/DmitryRyumin/NewEraAI-Papers@main/images/right.svg" width="40" alt="" />
+    </a>
+</div>
+
+## Vision and Language
+
+![Section Papers](https://img.shields.io/badge/Section%20Papers-soon-42BA16) ![Preprint Papers](https://img.shields.io/badge/Preprint%20Papers-soon-b31b1b) ![Papers with Open Code](https://img.shields.io/badge/Papers%20with%20Open%20Code-soon-1D7FBF) ![Papers with Video](https://img.shields.io/badge/Papers%20with%20Video-soon-FF0000)
+
+| **Title** | **Repo** | **Paper** | **Video** |
+|-----------|:--------:|:---------:|:---------:|
+| JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval | [![GitHub](https://img.shields.io/github/stars/DannielGe/JM-CLIP?style=flat)](https://github.com/DannielGe/JM-CLIP) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446490-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446490) | :heavy_minus_sign: |
+| Language-Free Compositional Action Generation via Decoupling Refinement | [![GitHub](https://img.shields.io/github/stars/XLiu443/Language-free-Compositional-Action-Generation-via-Decoupling-Refinement?style=flat)](https://github.com/XLiu443/Language-free-Compositional-Action-Generation-via-Decoupling-Refinement) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448207-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448207) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2307.03538-b31b1b.svg)](https://arxiv.org/abs/2307.03538) | :heavy_minus_sign: |
+| DAP: Domain-Aware Prompt Learning for Vision-and-Language Navigation | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446504-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446504) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2311.17812-b31b1b.svg)](https://arxiv.org/abs/2311.17812) | :heavy_minus_sign: |
+| M3sum: A Novel Unsupervised Language-Guided Video Summarization | [![GitHub](https://img.shields.io/github/stars/ZovanZhou/M3Sum?style=flat)](https://github.com/ZovanZhou/M3Sum) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447504-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447504) | :heavy_minus_sign: |
+| WAVER: Writing-Style Agnostic Text-Video Retrieval via Distilling Vision-Language Models through Open-Vocabulary Knowledge | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446193-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446193) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.09507-b31b1b.svg)](https://arxiv.org/abs/2312.09507) | :heavy_minus_sign: |
+| MTIDNet: A Multimodal Temporal Interest Detection Network for Video Summarization | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448236-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448236) | :heavy_minus_sign: |
diff --git a/sections/2024/main/MMSP.md b/sections/2024/main/MMSP.md
@@ -34,9 +34,9 @@
 
 | **Title** | **Repo** | **Paper** | **Video** |
 |-----------|:--------:|:---------:|:---------:|
-| Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://smil-spcras.github.io/DAVIS/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448048) | :heavy_minus_sign: |
-| The Multimodal Information based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mispchallenge.github.io/mispchallenge2023/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447462) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.08348-b31b1b.svg)](https://arxiv.org/abs/2309.08348) | :heavy_minus_sign: |
-| Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447487) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.08850-b31b1b.svg)](https://arxiv.org/abs/2312.08850) | :heavy_minus_sign: |
-| TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448124) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.12306-b31b1b.svg)](https://arxiv.org/abs/2309.12306) | :heavy_minus_sign: |
-| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446769) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03424-b31b1b.svg)](https://arxiv.org/abs/2401.03424) | :heavy_minus_sign: |
-| GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10095313-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446460) | :heavy_minus_sign: |
+| Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://smil-spcras.github.io/DAVIS/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448048-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448048) | :heavy_minus_sign: |
+| The Multimodal Information based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | [![GitHub Page](https://img.shields.io/badge/GitHub-Page-159957.svg)](https://mispchallenge.github.io/mispchallenge2023/) | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447462-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447462) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.08348-b31b1b.svg)](https://arxiv.org/abs/2309.08348) | :heavy_minus_sign: |
+| Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10447487-E4A42C.svg)](https://ieeexplore.ieee.org/document/10447487) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2312.08850-b31b1b.svg)](https://arxiv.org/abs/2312.08850) | :heavy_minus_sign: |
+| TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10448124-E4A42C.svg)](https://ieeexplore.ieee.org/document/10448124) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2309.12306-b31b1b.svg)](https://arxiv.org/abs/2309.12306) | :heavy_minus_sign: |
+| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446769-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446769) <br /> [![arXiv](https://img.shields.io/badge/arXiv-2401.03424-b31b1b.svg)](https://arxiv.org/abs/2401.03424) | :heavy_minus_sign: |
+| GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions | :heavy_minus_sign: | [![IEEE Xplore](https://img.shields.io/badge/IEEE-10446460-E4A42C.svg)](https://ieeexplore.ieee.org/document/10446460) | :heavy_minus_sign: |
-Original file line number
+Diff line change
@@ Expand Up @@
             </tr>
             <tr>
                 <td>
-                    Vision and language
+                    Vision and Language
                 </td>
                 <td colspan="4" rowspan="201" align="center"><i>Will soon be added</i></td>
             </tr>
@@ Expand Down @@