Running credit to exciting research that I have referenced for algorithms: BERT-Based Idiom Detection Fuzzy dedup using Jaccard similarity TODO: A tiny CLI that lets you: run scraping jobs clean data train the model on some specified dataset