This repository contains the hands-on materials as well as pointers to the datasets.
- Python: we use the Python dataset used by CodeTrans. Download from this Dropbox folder - the file is in the
CodeSearchNet_clean
directory:python-train_clean.tsv
- English: the notebook has the code to download the NLTK dataset.
Paper titled "On Naturalness of Software" by Hindle et al. is available from here.