This Python library will parse an OpenITI mARkdown document and return a python class representation of the document structures.
import oimdp
md_file = open("mARkdownfile", "r")
text = md_file.read()
md_file.close()
parsed = oimdp.parse(text)
Please see the docs, but here are some highlights:
content
: a list of content structures
get_clean_text()
: get the text stripped of markup
Content
classes contain an original value from the document and some extracted content such as a text string or a specific value.
Most other structures are listed in sequence (e.g. a Paragraph
is followed by a Line
).
Line
objects and other line-level structures are divided in PhrasePart
objects.
PhrasePart
are phrase-level tags
Set up a virtual environment with venv
python3 -m venv .env
Activate the virtual environment
source .env/bin/activate
Install
python setup.py install
With the environment activated:
python tests/test.py