Skip to content

ACL 2023 short: Balancing Lexical and Semantic Quality in Abstractive Summarization

License

Notifications You must be signed in to change notification settings

jeewoo1025/BalSum

Repository files navigation

BalSum: Balancing Lexical and Semantic Quality in Abstractive Summarization

This repository contains the code for our paper "Balancing Lexical and Semantic Quality in Abstractive Summarization (ACL short, 2023)".

Overview

We propose a novel training method in which a re-ranker balances lexical and semantic quality. Based on a two-stage framework, our model, named BalSum, is trained on multi-task learning. We directly reflect the ROUGE score difference on a ranking loss to preserve the lexical quality as much as possible. Then, we use a contrastive loss with instance weighting to identify summaries whose meanings are close to the document. Specifically, we define novel false positives (semantic mistakes) and present a strategy to reduce their influence in ranking.

How to Install

Requirements

  • python3.8

Run the following script to install the additional libraries

pip install -r requirements.txt

Description of Codes

We implement our model based on Huggingface Transformers library.

  • cal_rouge.py, cal_bertscore.py : ROUGE / BERTScore calculation
  • config.py : model configuration
  • data_utils.py : dataloader
  • model.py : model architecture
  • main.py : training and evaluation procedure
  • utils.py : utility functions

Workspace

You should create following directories for our experiments:

  • ./cache_cnndm, ./cache_xsum : save model checkpoints for each dataset
  • ./result : save evaluation results

Dataset

We experiment on two datasets.

Prepare Candidate Summaries

We referred to BRIO code when we generated and preprocessed candidate summaries. Additionally, I measured cosine-similarity between the reference and each candidate summaries using SimCSE model for instance weighting strategy. Then, we classify them above each threshold (~0.9) and save them in the dateset file by the threshold.

How to Run

Training

You can change the specific settings in config.py.

python main.py --cuda --gpuid [list of gpuids] -l --config [(cnndm/xsum)] --wandb [Project Name of Wandb]

Example: training on CNN/DM

python main.py --cuda --gpuid 0 -l --config cnndm --wandb CNNDM_train

Evaluation

For ROUGE calculation, we use the standard ROUGE Perl Package. We lowercased and tokenized (using PTB Tokenizer) texts before calculating ROUGE scores. To evaluate BalSum, please change MODEL_PATH on run_evaluate.sh and run below script:

bash run_evaluate.sh

MODEL_PATH should be a subdirectory in the ./cache_cnndm or ./cache_xsum.

Citation

Please cite our paper if you use BalSum in your work:

@inproceedings{sul-choi-2023-balancing,
    title = "Balancing Lexical and Semantic Quality in Abstractive Summarization",
    author = "Sul, Jeewoo  and  Choi, Yong Suk",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-short.56",
    pages = "637--647",
    abstract = "An important problem of the sequence-to-sequence neural models widely used in abstractive summarization is exposure bias. To alleviate this problem, re-ranking systems have been applied in recent years. Despite some performance improvements, this approach remains underexplored. Previous works have mostly specified the rank through the ROUGE score and aligned candidate summaries, but there can be quite a large gap between the lexical overlap metric and semantic similarity. In this paper, we propose a novel training method in which a re-ranker balances the lexical and semantic quality. We further newly define false positives in ranking and present a strategy to reduce their influence. Experiments on the CNN/DailyMail and XSum datasets show that our method can estimate the meaning of summaries without seriously degrading the lexical aspect. More specifically, it achieves an 89.67 BERTScore on the CNN/DailyMail dataset, reaching new state-of-the-art performance. Our code is publicly available at https://github.com/jeewoo1025/BalSum.",
}

If you have any questions, please put them on github issue. Thank you for your interests.

About

ACL 2023 short: Balancing Lexical and Semantic Quality in Abstractive Summarization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published