Skip to content

Minimal Generalization Learner for morphology and phonology (Albright & Hayes 2002, 2003) with command-line interface

Notifications You must be signed in to change notification settings

colincwilson/MinimalGeneralizationLearner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository provides a command-line interface for the Minimal Generalization Learner (Albright & Hayes 2002, 2003, etc.) and slightly modified input files for English past-tense simulations. Send comments or questions to colin-dot-wilson-at-jhu-dot-edu.

version1/

The original version of the learner with a GUI interface, files downloaded [06/14/2021] from http://www.mit.edu/~albright/mgl/ and https://linguistics.ucla.edu/people/hayes/RulesVsAnalogy/index.html. MinGenLearner.jar seems to provide the most up-to-date compiled code. Run with: java -jar MinGenLearner.jar

version2/

The new version of the learner with a command-line interface. The (only) file added to the original code is src/LearnerCommandline.java. Arguments for the learner are specified with YAML files, see for example english.yaml. Run the English past-tense simulation with 00runme.sh.

This version comes already compiled (with Java 16). Recompile with: ant -buildfile MinimalGeneralizationLearner.xml. The one external dependency is SnakeYAML, included here as extern/snakeyaml-1.29.jar.

The new version also has the original GUI interace, run with: java -jar bin/mingenlearn.jar

english/

English past-tense data provided by Albright & Hayes, with various transcriptions.

  • English1 and English2 have the original small and large data sets, respectively. Some erroneous past-tense forms in the original English2 data are listed in English2_errors.txt.

  • English2_unicode has transcriptions that are closer to IPA but adhere to the one-symbol-per-phoneme format required by the learner; these are the input files for the example simulation in version2/00runme.sh. Also in this folder are the English data from the SIGMORPHON 2021 Shared Task on Generalization in Morphological Inflection Generation.

  • English2_IPA has the data in space-separated IPA transcription, which is incompatible with the learner.

  • English_phonemes.ods compares the transcription systems and provides a feature matrix.

feature file format

The format of feature (.fea) files is quite strict, as follows:

ASCII<tab>Seg.<tab><long feature names, tab-separated>
<tab><tab><short feature names, tab-separated>
<id_1><tab>segment_1<tab><feature specifications (+1, 0, -1), tab-separated>
<id_2><tab>segment_2<tab><feature specifications (+1, 0, -1), tab-separated>
...
<id_m><tab>ˈ<tab><feature specifications of stress symbol, tab-separated>
<id_n><tab>~<tab><feature specifications of empty symbol, tab-separated>

The 'ASCII' values must be unique integers, but are otherwise arbitrary.
The stress symbol can be omitted if it does not appear in the transcriptions.

About

Minimal Generalization Learner for morphology and phonology (Albright & Hayes 2002, 2003) with command-line interface

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages