SemantiCodec

Ultra-low bitrate neural audio codec with a better semantic in the latent space.

Highlight

Bitrate: 0.31 kbps - 1.40 kbps
Token rate: 25, 50, or 100 per second
cpu, cuda, and mps are supported

Usage

Installation

pip install git+https://github.com/haoheliu/SemantiCodec-inference.git

Encoding and decoding

Checkpoints will be automatically downloaded when you initialize the SemantiCodec with the following code.

from semanticodec import SemantiCodec

semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384) 

filepath = "test/test.wav" # audio with arbitrary length

tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)

# Save the reconstruction file
import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)

Other Settings

from semanticodec import SemantiCodec

###############Choose one of the following######################
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=32768) # 1.40 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=32768) # 0.70 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=32768) # 0.35 kbps

semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384) # 1.35 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=16384) # 0.68 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=16384) # 0.34 kbps

semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=8192) # 1.30 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=8192) # 0.65 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=8192) # 0.33 kbps

semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=4096) # 1.25 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=4096) # 0.63 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=4096) # 0.31 kbps
#####################################

filepath = "test/test.wav"

tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)

import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)

If you are interested in reusing the same evaluation pipeline and data in the paper, please refer to this zenodo repo.

Citation

If you find this repo helpful, please consider citing in the following format:

@article{liu2024semanticodec,
  title={SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound},
  author={Liu, Haohe and Xu, Xuenan and Yuan, Yi and Wu, Mengyue and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:2405.00233},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
semanticodec		semanticodec
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
result.png		result.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemantiCodec

Usage

Installation

Encoding and decoding

Other Settings

Citation

About

Releases

Packages

Contributors 2

Languages

License

haoheliu/SemantiCodec-inference

Folders and files

Latest commit

History

Repository files navigation

SemantiCodec

Usage

Installation

Encoding and decoding

Other Settings

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages