Moment Localization with two different approaches using Residual Connection

As various types of unstructured data, such as life logging, increase with the development of networks and smart devices, research on multimodal learning through vision and language is drawing more attention. In particular, we noted the moment localization that determines the temporal moment corresponding to the natural language query, where many studies are being conducted. Accordingly, two improvement measures are proposed by analyzing the (2D-TAN) model that proposed the 2D thermal map. Based on the application of the residual block and the concept of DenseNet, we experimented with a model that combines the hidden layers of all previous blocks as input, and for the Charades-STA dataset, performance improvement of up to 5 points compared to the performance of the previous model was confirmed.

Method

1) MoL + R : Moment localization with Residual block

We constructed a Temporal adjective network by reflecting the Residual block. Additionally adding only residual information to the learned function before transferring the block-by-block parameters. Therefore, learning became easier than learning the whole. In addition, in the case of the previous method, since all weight layers are separated when learning the whole, the difficulty of convergence increased by learning for each layer, and convergence became easier by using residual blocks. By directly inserting information about the previous input x each block, the information on the original video and the natural language query can be consistently maintained.

2) MoL + D : Moment localization with Dense layer

We applied the Dense layer by advancing one step further from the network of Moment localization with residual block. In the Dense layer, like Residual blocks, one block consists of two convolution networks. Unlike the Residual block, which adds the outputs of the previous block, F(x) and x, it adds the input of all passed blocks. It represented as

$H(x)=F(x_{n-1})+\sum_{i=1}^{n-1}x_{i}$

A temporal adjective network is configured by replacing the convolution layer of the previous model with 4 Dense layers. Dense Layer concatenates the feature map of the previous layer to the feature map of all layers that appear thereafter. Through this configuration, the effect of regularization can also be seen because it prevents loss of information, such as alleviating the vanishing gradient problem, and learns by connecting feature maps of various layers.

Main Results

Main results on Charades-STA

Method	[email protected]	[email protected]	[email protected]	[email protected]
2D-TAN	39.70	23.31	80.32	51.26
MoL + R	41.02	23.33	84.33	50.03
MoL + D	42.04	24.46	85.94	52.18

Main results on ActivityNet Captions

Method	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
2D-TAN	59.45	44.51	26.54	85.53	77.13	61.96
MoL + R	60.71	44.08	26.05	85.20	76.50	60.61
MoL + D	60.63	44.33	26.43	85.23	76.40	60.83

Main results on TACoS

Method	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
2D-TAN	47.59	37.29	25.32	70.31	57.81	45.04
MoL + R	47.86	38.09	26.27	72.48	60.93	47.54
MoL + D	48.09	36.49	25.12	73.11	57.79	45.51

Prerequisites

pytorch 1.1.0
python 3.7
torchtext
easydict
terminaltables

Training

Use the following commands for training:

# Evaluate "Pool" in Table 1
python moment_localization/train.py --cfg experiments/charades/2D-TAN-16x16-K5L8-pool.yaml --verbose

# Evaluate "Pool" in Table 2
python moment_localization/train.py --cfg experiments/activitynet/2D-TAN-64x64-K9L4-pool.yaml --verbose

# Evaluate "Pool" in Table 3
python moment_localization/train.py --cfg experiments/tacos/2D-TAN-128x128-K5L8-pool.yaml --verbose

Testing

Run the following commands for evaluation:

# Evaluate "Pool" in Table 1
python moment_localization/test.py --cfg experiments/charades/2D-TAN-16x16-K5L8-pool.yaml --verbose --split test

# Evaluate "Pool" in Table 2
python moment_localization/test.py --cfg experiments/activitynet/2D-TAN-64x64-K9L4-pool.yaml --verbose --split test

# Evaluate "Pool" in Table 3
python moment_localization/test.py --cfg experiments/tacos/2D-TAN-128x128-K5L8-pool.yaml --verbose --split test

Citation

@InProceedings{2DTAN_2020_AAAI,
author = {Zhang, Songyang and Peng, Houwen and Fu, Jianlong and Luo, Jiebo},
title = {Learning 2D Temporal Adjacent Networks forMoment Localization with Natural Language},
booktitle = {AAAI},
year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
experiments		experiments
imgs		imgs
lib		lib
log		log
moment_localization		moment_localization
.gitattributes		.gitattributes
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
journal.zip		journal.zip
notice.md		notice.md
report_paper.pdf		report_paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moment Localization with two different approaches using Residual Connection

Method

1) MoL + R : Moment localization with Residual block

2) MoL + D : Moment localization with Dense layer

Main Results

Main results on Charades-STA

Main results on ActivityNet Captions

Main results on TACoS

Prerequisites

Training

Testing

Citation

About

Releases

Packages

Languages

License

hhzet11/CV_Moment-Localization

Folders and files

Latest commit

History

Repository files navigation

Moment Localization with two different approaches using Residual Connection

Method

1) MoL + R : Moment localization with Residual block

2) MoL + D : Moment localization with Dense layer

Main Results

Main results on Charades-STA

Main results on ActivityNet Captions

Main results on TACoS

Prerequisites

Training

Testing

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages