This repository is created to host code and scripts related to my Applied Data Science masters thesis.
A table is provided as overview and guidance when using the various files.
File | Purpose |
---|---|
data_exploration.ipynb | Exploratory analysis of the dataset |
dev_sampler.sh | Create a development subset |
hypothesis_cleaner.py | Clean Whisper transcripts for use as JiWER hypothesis |
test_sampler.sh | Create a test subset |
transcript_converter.py | Clean ort transcripts and convert to txt files |