MC Markov is a python application for generating new text based on a corpus of observed text (like the ebooks twitter accounts).
The markov chain trainer makes a number of improvements over other out-of-the-box markov-model python applications (at least the ones that I've seen):
- The markov chain can train higher-order models that look for chains of 2, 3, or 4 words (ngrams) in the corpus
- The model is built using numpy arrays so it's very fast
- The probability matrix used for the model only contains observed ngrams, so it's feasible to train high-order models on large corpi.
###Efficiency: 3. Parallelize the model-fitting/counting procedure 4. Use broadcasting for normalizing the numpy array
###UI:
- Improve handling of mis-formatted arguments (e.g., single list instead of list-of-lists for the corpus)
- Write documentation!
###New Features 5. Write song-writing module that includes the following components
- Ability to specify the ending word of a line (for rhyming)
- Ability to end the next line with a word that rhymes with the last word of the previous line
- Create dictionary of rhymes for last-words that are used as 'seeds'
- How to handle frequency? Should the list be unique, or should it reflect observed frequency?
- Write a method for choosing a rhyming word that rhymed with the last line but is not the same word
- Write a method for building raps using couplets
- Create dictionary of rhymes for last-words that are used as 'seeds'
- Ability to specify the syllable count of a line
- Option to 'clean' the corpus by removing certain punctuation
From the installed directory, run python -m unittest discover -s . -p 'test.py'