A simple Python script for extracting audio embeddings.
- PyTorch
- Transformers
- fairseq
Download the provided model and put into ./model
folder.
pip install -r requirements.txt
Here's an example of how you can use audio_embeddings:
python audio_embedding.py -i demo/sample_audio.wav -o outputs/short.npy -b 1280 -f 16000
usage: audio_embedding.py [-h] [-i INPUT] [-o OUTPUT] [-b BLOCK] [-f FREQ]
Image caption CLI
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT Input directory path, such as ./sample.wav)
-o OUTPUT, --output OUTPUT Output directory, such as output.csv
-b BLOCK, --block BLOCK Block length
-f FREQ, --freq FREQ Audio file frequency
from audio_embedding import extract_embeddings
from model_engine import get_model, get_processor
from utils import concat_and_rescale, save_embeddings
import pandas as pd
import numpy as np
import uuid
AUDIO_PATH = r"./demo/sample_audio.wav"
OUTPUT_PATH = f"./outputs/embedding_{uuid.uuid4()}"
BLOCK_LENGTH = 1280
TARGET_SR = 16000
model = get_model()
processor = get_processor()
# Extract Embeddings
raw_embeddings = extract_embeddings(
audio_path=AUDIO_PATH,
model=model,
processor=processor,
block_length=BLOCK_LENGTH,
target_sr=TARGET_SR,
)
# Embedding post-processing
embeddings = concat_and_rescale(raw_embeddings)
print(embeddings.shape)
# Save Embeddings
save_embeddings(OUTPUT_PATH, embeddings)
- Clipping silence
- Model downloader
- Drop duplicate rows and columns
This project is licensed under the Apache Licence 2.0.