Song Lyric Generation with Deep Models

Welcome to the Song Lyric Generation project! This repository contains the code and resources for an NLP research project focused on generating song lyrics using deep learning models. Our goal is to explore different models and techniques to generate lyrics that can mimic specific singer's styles as well as produce generic lyrics.

Introduction

This research project aims to create song lyric generators using various deep learning models. We implemented four generators in total. Two for generic lyric generation and two for specific artist lyric generation. The models were trained on a song lyric datset created from 2 different data sources and evaluated through human assessments to ensure the generated lyrics are meaningful and stylistically accurate.

Project Structure

.
├── data
│ ├── raw # Raw dataset files
|   ├── csv # Raw csv song file
|   └── txt # Raw artist lyrics as txt files
│ ├── interim # Interim dataset files after initial preprocessing
│ └── processed # Final processed dataset files ready for model training
├── models # Saved model checkpoints
├── src
│ ├── prep # Scripts for data preprocessing
│ ├── train # Scripts for training the models
│ └── eval # Scripts for evaluating the models
├── docs
│ └── project documentation.pdf # Research report in Croatian
├── README.md # Project README file
└── requirements.txt # Python dependencies

Models Used

Generic Lyric Generation

BiLSTM: A Bidirectional Long Short-Term Memory model used for sequential data processing.
Variational Autoencoder (VAE): A generative model that learns to represent data in a latent space.

Specific Singer Style Generation

GPT-2: A powerful transformer-based model pre-trained on a large corpus of text, fine-tuned to mimic a specific singer's style.
TinyLlama: A smaller, efficient model similar to GPT-2, optimized for stylistic lyric generation.

Data Preprocessing

Data preprocessing is crucial for training effective models. The preprocessing pipeline involves:

Making an Interim Dataset: Initial cleaning and formatting of raw data.
Making a Processed Dataset: Further processing to prepare data for model training, such as tokenization and sequence generation.

Scripts for data preprocessing are located in the src/prep directory.

Evaluation

The models were evaluated using human evaluations. We focused on the meaningfulness and stylistic accuracy of the generated lyrics. The evaluation criteria included:

Coherence and fluency of the lyrics
Stylistic resemblance to the target singer (for specific singer models)
Overall creativity and originality

In our report we also provide GPT-4 evaluation to compare how a SOTA large language model evaluates a task only humans can effectively evaluate.

How to Use

Prerequisites

Ensure you have Python installed. Install the required dependencies using:

pip install -r requirements.txt


Create dataset:

```python src/prep/make_interim_dataset.py -csv_file_path data/raw/csv/lyrics-data.csv -txt_dir_path data/raw/txt```

```python src/prep/make_processed_dataset.py --interim_path data/interim/merged_data.csv```


Run training and inference scripts:

```python src/train/train_[model]_[task].py -csv_file_path data/raw/csv/lyrics-data.csv -txt_dir_path data/raw/txt```

```python src/eval/inference_[model]_[task].py --seed_text [text] <--next_words [number of words to generate]>```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Song Lyric Generation with Deep Models

Table of Contents

Introduction

Project Structure

Models Used

Generic Lyric Generation

Specific Singer Style Generation

Data Preprocessing

Evaluation

How to Use

Prerequisites

License

About

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
docs		docs
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

rejsafranko/Song-Lyrics-Generation

Folders and files

Latest commit

History

Repository files navigation

Song Lyric Generation with Deep Models

Table of Contents

Introduction

Project Structure

Models Used

Generic Lyric Generation

Specific Singer Style Generation

Data Preprocessing

Evaluation

How to Use

Prerequisites

License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages