A study of the recent paper Mapping and Modeling the semantic space of Math concepts. Combining cognitive neuroscience and ML.
- The folders Embeddings, AnalysisPipeline and Vocabulary are taken from the paper official code and data.
- The folder Further-Analysis contains a code I wrote to explore modern embeddings (GPT2 large). The notebook is clear and commented.
It uses in particular a chunk of the math-english vocabulary introduced in the paper whose words are well-represented by GPT2 tokenizer as single tokens, and analyzes them. - The pdf report contains an extensive analysis of the paper and presents the personal extension implemented in Further-Analysis.