From e338f9771b27c837d433bd5dc608b1b9e37e3157 Mon Sep 17 00:00:00 2001 From: Eric Joanis Date: Thu, 20 Jun 2024 12:41:34 -0400 Subject: [PATCH 1/5] refactor(docs): automatically convert from Sphinx .rst to mkdocs .md Using rst2myst convert installed with pip install rst-to-myst --- docs/{README.md => Contributing.md} | 0 docs/{advanced-use.rst => advanced-use.md} | 81 ++- docs/cli-guide.md | 463 ++++++++++++++++ docs/cli-guide.rst | 507 ------------------ docs/cli-ref.md | 53 ++ docs/cli-ref.rst | 39 -- docs/index.md | 24 + docs/index.rst | 26 - docs/installation.md | 5 + docs/installation.rst | 6 - docs/outputs.md | 50 ++ docs/outputs.rst | 52 -- docs/start.md | 41 ++ docs/start.rst | 45 -- ...troubleshooting.rst => troubleshooting.md} | 87 +-- 15 files changed, 721 insertions(+), 758 deletions(-) rename docs/{README.md => Contributing.md} (100%) rename docs/{advanced-use.rst => advanced-use.md} (53%) create mode 100644 docs/cli-guide.md delete mode 100644 docs/cli-guide.rst create mode 100644 docs/cli-ref.md delete mode 100644 docs/cli-ref.rst create mode 100644 docs/index.md delete mode 100644 docs/index.rst create mode 100644 docs/installation.md delete mode 100644 docs/installation.rst create mode 100644 docs/outputs.md delete mode 100644 docs/outputs.rst create mode 100644 docs/start.md delete mode 100644 docs/start.rst rename docs/{troubleshooting.rst => troubleshooting.md} (58%) diff --git a/docs/README.md b/docs/Contributing.md similarity index 100% rename from docs/README.md rename to docs/Contributing.md diff --git a/docs/advanced-use.rst b/docs/advanced-use.md similarity index 53% rename from docs/advanced-use.rst rename to docs/advanced-use.md index 2f6e3e23..f010c094 100644 --- a/docs/advanced-use.rst +++ b/docs/advanced-use.md @@ -1,27 +1,24 @@ -.. _advanced-use: +(advanced-use)= -Advanced topics -=============== +# Advanced topics -.. _adding-a-lang: +(adding-a-lang)= -Adding a new language to g2p ----------------------------- +## Adding a new language to g2p If you want to align an audio book in a language that is not yet supported by the g2p library, you will have to write your own g2p mapping for that language. References: - - The `g2p library `__ and its - `documentation `__. - - The `7-part blog post on creating g2p mappings `__ on the `Mother Tongues Blog `__. +: - The [g2p library](https://github.com/roedoejet/g2p) and its + [documentation](https://g2p.readthedocs.io/). + - The [7-part blog post on creating g2p mappings](https://blog.mothertongues.org/g2p-background/) on the [Mother Tongues Blog](https://blog.mothertongues.org/). Once you have created a g2p mapping for your language, please consider -`contributing it to the project `__ +[contributing it to the project](https://blog.mothertongues.org/g2p-contributing/) so others can also benefit from your work! -Pre-processing your data ------------------------- +## Pre-processing your data Manipulating the text and/or audio data that you are trying to align can sometimes produce longer, more accurate ReadAlongs, that throw less @@ -29,56 +26,52 @@ errors when aligning. While some of the most successful techniques we have tried are outlined here, you may also need to customize your pre-processing to suit your specific data. -Audio pre-processing -~~~~~~~~~~~~~~~~~~~~ +### Audio pre-processing -Adding silences -^^^^^^^^^^^^^^^ +#### Adding silences Adding 1 second segments of silence in between phrases or paragraphs sometimes improves the performance of the aligner. We do this using the -`Pydub `__ library which can be +[Pydub](https://github.com/jiaaro/pydub) library which can be pip-installed. Keep in mind that Pydub uses milliseconds. If your data is currently 1 audio file, you will need to split it into segments where you want to put the silences. -:: - - ten_seconds = 10 * 1000 - first_10_seconds = soundtrack[:ten_seconds] - last_5_seconds = soundtrack[-5000:] +``` +ten_seconds = 10 * 1000 +first_10_seconds = soundtrack[:ten_seconds] +last_5_seconds = soundtrack[-5000:] +``` Once you have your segments, create an MP3 file containing only 1 second of silence. -:: - - from pydub import AudioSegment +``` +from pydub import AudioSegment - wfile = "appended_1000ms.mp3" - silence = AudioSegment.silent(duration=1000) - soundtrack = silence +wfile = "appended_1000ms.mp3" +silence = AudioSegment.silent(duration=1000) +soundtrack = silence +``` Then you loop the audio files you want to append (segments and silence). -:: - - seg = AudioSegment.from_mp3(mp3file) - soundtrack = soundtrack + silence + seg +``` +seg = AudioSegment.from_mp3(mp3file) +soundtrack = soundtrack + silence + seg +``` Write the soundtrack file as an MP3. This will then be the audio input for your Read-Along. -:: +``` +soundtrack.export(wfile, format="mp3") +``` - soundtrack.export(wfile, format="mp3") +### Text pre-processing -Text pre-processing -~~~~~~~~~~~~~~~~~~~ - -Numbers -^^^^^^^ +#### Numbers ReadAlong Studio cannot align numbers written as digits (ex. "123"). Instead, you will need to write them out (ex. "one two three" or "one @@ -87,10 +80,10 @@ file. If you have lots of data, and the numbers are spoken in English (or any of their supported languages), consider adding a library like -`num2words `__ to your +[num2words](https://github.com/savoirfairelinux/num2words) to your pre-processing. -:: - - num2words 123456789 - one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine +``` +num2words 123456789 +one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine +``` diff --git a/docs/cli-guide.md b/docs/cli-guide.md new file mode 100644 index 00000000..a335fb91 --- /dev/null +++ b/docs/cli-guide.md @@ -0,0 +1,463 @@ +(cli-guide)= + +# Command line interface (CLI) user guide + +This page contains guidelines on using the ReadAlongs CLI. See also +{ref}`cli-ref` for the full CLI reference. + +The ReadAlongs CLI has two main commands: `readalongs make-xml` and +`readalongs align`. + +- If your data is a plain text file, you can run `make-xml` to turn + it into ReadAlongs XML, which you can then align with + `align`. Doing this in two steps allows you to modify the XML file + before aligning it (e.g., to mark that some text is in a different + language, to flag some do-not-align text, or to drop anchors in). +- Alternatively, if your plain text file does not need to be modified, you can + run `align` directly on it, since it also accepts plain text input. You'll + need the `-l ` option to indicate what language your text is in. + +Two additional commands are sometimes useful: `readalongs tokenize` and +`readalongs g2p`. + +- `tokenize` takes the output of `make-xml` and tokenizes it, wrapping each + word in the text in a `` element. +- `g2p` takes the output of `tokenize` and mapping each word to its + phonetic transcription using the g2p library. The phonetic transcription is + represented using the ARPABET phonetic codes and are added in the `ARPABET` + attribute to each `` element. + +The result of `tokenize` or `g2p` can be fixed manually if necessary and +then used as input to `align`. + +## Getting from TXT to XML with readalongs make-xml + +Run {ref}`cli-make-xml` to make the ReadAlongs XML file for `align` from a TXT file. + +`readalongs make-xml [options] [story.txt] [story.readalong]` + +`[story.txt]`: path to the plain text input file (TXT) + +`[story.readalong]`: Path to the XML output file + +The plain text file must be plain text encoded in `UTF-8` with one +sentence per line. Paragraph breaks are marked by a blank line, and page +breaks are marked by two blank lines. + +| Key Options | Option descriptions | +| ------------------------------ | --------------------------------------------------------------------------------------------------------------------- | +| `-l, --language(s)` (required) | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers {ref}`g2p-cascade`. | +| `-f, --force-overwrite` | Force overwrite output files (handy if you're troubleshooting and will be aligning repeatedly) | +| `-h, --help` | Displays CLI guide for `make-xml` | + +The `-l, --language` argument requires a language’s 3 character [ISO +code](https://en.wikipedia.org/wiki/ISO_639-3) as an argument. + +The languages supported by RAS can be listed by running `readalongs make-xml -h` +and they can also be found in the {ref}`cli-make-xml` reference. + +So, a full command for a story in Algonquin, with an implicit g2p fallback to +Undetermined, would be something like: + +`readalongs make-xml -l alq Studio/story.txt Studio/story.readalong` + +The generated XML will be parsed in to sentences. At this stage you can +edit the XML to have any modifications, such as adding `do-not-align` +as an attribute of any element in your XML. + +The format of the generated XML is based on \[TEI +Lite\]() but is +considerably simplified. The DTD (document type definition) can be +found in the ReadAlong Studio source code under +`readalongs/static/read-along-1.0.dtd`. + +(dna)= + +### Handling mismatches: do-not-align + +There are two types of "do-not-align" (DNA) content: DNA audio and DNA text. + +To use DNA text, add `do-not-align` as an attribute to any +element in the xml (word, sentence, paragraph, or page). + +``` +dog +``` + +If you have already run `readalongs make-xml`, there will be +documentation for DNA text in comments at the beginning of the generated +xml file. + +``` + +``` + +To use DNA audio, you can specify a timeframe in milliseconds in the +`config.json` file which you want the aligner to ignore. + +``` +"do-not-align": + { + "method": "remove", + "segments": + [ + { + "begin": 1, + "end": 17000 + } + ] + } +``` + +#### Use cases for DNA + +- Spoken introduction in the audio file that has no accompanying text + (DNA audio) +- Text that has no matching audio, such as credits/acknowledgments (DNA + text) + +## Aligning your text and audio with readalongs align + +Run {ref}`cli-align` to align a text file (RAS or TXT) and an audio file to +create a time-aligned audiobook. + +`readalongs align [options] [story.txt/xml] [story.mp3/wav] [output_base]` + +`[story.txt/ras]`: path to the text file (TXT or RAS) + +`[story.mp3/wav]`: path to the audio file (MP3, WAV or any format +supported by ffmpeg) + +`[output_base]`: path to the directory where the output files will be +created, as `output_base*` + +| Key Options | Option descriptions | +| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `-l, --language(s)` | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers {ref}`g2p-cascade`. (required if input is plain text) | +| `-c, --config PATH` | Use ReadAlong-Studio configuration file (in JSON format) | +| `--debug-g2p` | Display verbose g2p debugging messages | +| `-s, --save-temps` | Save intermediate stages of processing and temporary files (dictionary, FSG, tokenization, etc.) | +| `-f, --force-overwrite` | Force overwrite output files (handy if you’re troubleshooting and will be aligning repeatedly) | +| `-h, --help` | Displays CLI guide for `align` | + +See above for more information on the `-l, --language` argument. + +A full command could be something like: + +`readalongs align -f -c config.json story.readalong story.mp3 story-aligned` + +**Is the text file plain text or XML?** + +`readalongs align` accepts its text input as a plain text file or a ReadAlongs XML file. + +- If the file name ends with `.txt`, it will be read as plain text. +- If the file name ends with `.xml` or `.readalong`, it will be read as ReadAlongs XML. +- With other extensions, the beginning of the file is examined to + automatically determine if it's XML or plain text. + +## Supported languages + +The `readalongs langs` command can be used to list all supported languages. + +Here is that list at the time of compiling this documentation: + +```{eval-rst} +.. command-output:: readalongs langs +``` + +See {ref}`adding-a-lang` for references on adding new languages to that list. + +## Adding titles, images and do-not-align segments via the config.json file + +Some additional parameters can be specified via a config file: create +a JSON file called `config.json`, possibly in the same folder as +your other ReadAlong input files for convenience. The config file +currently accepts a few components: adding titles and headers, adding +images to your ReadAlongs, and DNA audio (see {ref}`dna`). + +To add a title and headers to the output HTML, you can use the keys +`"title"`, `"header"`, and `"subheader"`, for example: + +``` +{ + "title": "My awesome read-along", + "header": "A story in my language", + "subheader": "Read by me" +} +``` + +To add images, indicate the page number as the key, and the name of the image +file as the value, as an entry in the `"images"` dictionary. + +``` +{ "images": { "0": "p1.jpg", "1": "p2.jpg" } } +``` + +Both images and DNA audio can be specified in the same config file, such +as in the example below: + +``` +{ + "images": + { + "0": "image-for-page1.jpg", + "1": "image-for-page1.jpg", + "2": "image-for-page2.jpg", + "3": "image-for-page3.jpg" + }, + + "do-not-align": + { + "method": "remove", + "segments": + [ + { "begin": 1, "end": 17000 }, + { "begin": 57456, "end": 68000 } + ] + } +} +``` + +Warning: mind your commas! The JSON format is very picky: commas +separate elements in a list or dictionnary, but if you accidentally have +a comma after the last element (e.g., by cutting and pasting whole +lines), you will get a syntax error. + +(g2p-cascade)= + +## The g2p cascade + +Sometimes the g2p conversion of the input text will not succeed, for +various reasons. A word might use characters not recognized by the g2p mapping +for the language, or it might be in a different language. Whatever the +reason, the output for the g2p conversion will not be valid ARPABET, and +so the system will not be able to proceed to alignment by the +aligner, SoundSwallower. + +If you know the language for that text, you can mark it as such in the +XML. E.g.: + +```xml +This sentence is in English. +``` + +The `xml:lang` attribute can be added to any element in the XML structure +and will apply to text at any depth within that element, unless the +attribute is specified again at a deeper level, e.g.: + +```xml +English mixed with français. +``` + +There is also a simpler option available: the g2p cascade. When the g2p +cascade is enabled, the g2p mapping will be done by first trying the +language specified by the `xml:lang` attribute in the XML file +(or with the first language provided to the `-l` flag on the +command line, if the input is plain text). For each word where the +result is not valid ARPABET, the g2p mapping will be attempted again +with each of the languages specified in the g2p cascade, in order, until +a valid ARPABET conversion is obtained. If no valid conversion is +possible, are error message is printed and alignment is not attempted. + +To enable the g2p cascade, provide multiple languages via the `-l` switch +(for plain text input) or add the `fallback-langs="l2,l3,...` attribute to +any element in the XML file: + +```xml +English mixed with français. +``` + +These command line examples will set the language to `fra`, with the g2p cascade +falling back to `eng` and then `und` (see below) when needed. + +```bash +readalongs make-xml -l fra,eng myfile.txt myfile.readalong +readalongs align -l fra,eng myfile.txt myfile.wav output-dir +``` + +### The "Undetermined" language code: und + +Notice how the sample XML snippet above has `und` as the last language in the +cascade. `und`, for Undetermined, is a special language mapping that +uses the definition of all characters in all alphabets that are part of the +Unicode standard, and +maps them as if the name of that character was how it is pronounced. +While crude, this mapping works surprisingly well for the purposes of +forced alignment, and allows `readalongs align` to successfully align +most text with a few foreign words without any manual intervention. + +Since we recommend systematically using `und` at the end of the cascade, it +is now added by default after the languages specified with the `-l` +switch to both `readalongs align` and `readalongs make-xml`. Note that +adding other languages after `und` will have no effect, since the +Undetermined mapping will map any string to valid ARPABET. + +In the unlikely event that you want to disable adding `und`, add the hidden +`--lang-no-append-und` switch, or delete `und` from the `fallback-langs` +attribute in your XML input. + +### Debugging g2p mapping issues + +The warning messages issued by `readalongs g2p` and `readalongs align` +indicate which words are causing g2p problems and what fallbacks were tried. +It can be worth inspecting to input text to fix any encoding or spelling +errors highlighted by these warnings. More detailed messages can be +produced by adding the `--debug-g2p` switch, to obtain a lot more +information about g2p'ing words in each language g2p was unsucessfully +attempted. + +## Breaking up the pipeline + +Some commands were added to the CLI in the last year to break processing up step +by step. + +The following series of commands: + +``` +readalongs make-xml -l l1,l2 file.txt file.readalong +readalongs tokenize file.readalong file.tokenized.readalong +readalongs g2p file.tokenized.readalong file.g2p.readalong +readalongs align file.g2p.readalong file.wav output +``` + +is equivalent to the single command: + +``` +readalongs align -l l1,l2 file.txt file.wav output +``` + +except that when running the pipeline as four separate commands, you can +edit the XML files between each step to make manual adjustments and +corrections if you want, like inserting anchors, silences, changing the +language for indivual elements, or even manually editting the ARPABET encoding +for some words. + +## Anchors: marking known alignment points + +Long audio/text file pairs can sometimes be difficult to align +correctly, because the aligner might get lost part way through the +alignment process. Anchors can be used to tell the aligner about known +correspondance points between the text and the audio stream. + +### Anchor syntax + +Anchors are inserted in the XML file (the output of +`readalongs make-xml`, `readalongs tokenize` or `readalongs g2p`) +using the following syntax: `` or +``. The time can be specified in seconds (this +is the default) or milliseconds. + +Anchors can be placed anywhere in the XML file: between/before/after any +element or text. + +Example: + +```xml + + + +
+

+ Hello. + + This is a test + weirdword +

+
+ +
+``` + +### Anchor semantics + +When anchors are used, the alignment task is divided at each anchor, +creating a series of segments that are aligned independently from one +another. When alignment is performed, the aligner sees only the audio +and the text from the segment being processed, and the results are +joined together afterwards. + +The beginning and end of files are implicit anchors: *n* anchors define +*n+1* segments: from the beginning of the audio and text to the first +anchor, between pairs of anchors, and from the last anchor to the end of +the audio and text. + +Special cases equivalent to do-not-align audio: + +- If an anchor occurs before the first word in the text, the audio up to that + anchor’s timestamps is excluded from alignment. +- If an anchor occurs after the last word, the end of the audio is excluded + from alignment. +- If two anchors occur one after the other, the time span between them in the + audio is excluded from alignment. + +Using anchors to define do-not-align audio segments is effectively the same as +marking them as "do-not-align" in the `config.json` file, except that DNA +segments declared using anchors have a known alignment with respect to the +text, while the position of DNA segments declared in the config file are +inferred by the aligner. + +### Anchor use cases + +1. Alignment fails because the stream is too long or too difficult to + align. + + When alignment fails, listen to the audio stream and try to identify + where some words you can pick up start or end. Even if you don’t + understand the language, there might be some words you’re able to + pick up and use as anchors to help the aligner. + +2. You already know where some words/sentences/paragraphs start or end, + because the data came with some partial alignment information. For + example, the data might come from an ELAN file with sentence + alignments. + + These known timestamps can be converted to anchors. + +## Silences: inserting pause-like silences + +There are times where you might want a read-along to pause at a particular +place for a specific time and resume again after. This can be accomplished by +inserting silences in your audio stream. You can do it manually by editing your +audio file ahead of time, but you can also have `readalongs align` insert the +silences for you. + +### Silence syntax + +Silences are inserted in the audio stream wherever a `silence` element is +found in the XML input. +**TODO say something about how the silence placement determined.** +The syntax is like the anchor syntax: `` or +``. Like anchors, silence elements can be inserted +anywhere. + +Example: + +```xml + + + +
+

+ Hello. + + After this pregnant pause, we'll pause + again before it's all over! +

+ +
+
+``` + +### Silence use cases + +1. Your read along has a title page that is not read out in the audio stream: + insert a silence at the beginning so that it stays on the first page for + the specified time. + **TODO: test that a silence before the first word really keeps the RA on the + first page during that silence, even if all text on the first page is DNA.** +2. Your read along has a credits page at the end that is not read out in the + audio stream: insert a silence at the end so that people see that credits + page for the specified time before the streaming end. + **TODO: also test that this use case works as described.** diff --git a/docs/cli-guide.rst b/docs/cli-guide.rst deleted file mode 100644 index b92d580b..00000000 --- a/docs/cli-guide.rst +++ /dev/null @@ -1,507 +0,0 @@ -.. _cli-guide: - -Command line interface (CLI) user guide -======================================= - -This page contains guidelines on using the ReadAlongs CLI. See also -:ref:`cli-ref` for the full CLI reference. - -The ReadAlongs CLI has two main commands: ``readalongs make-xml`` and -``readalongs align``. - -- If your data is a plain text file, you can run ``make-xml`` to turn - it into ReadAlongs XML, which you can then align with - ``align``. Doing this in two steps allows you to modify the XML file - before aligning it (e.g., to mark that some text is in a different - language, to flag some do-not-align text, or to drop anchors in). - -- Alternatively, if your plain text file does not need to be modified, you can - run ``align`` directly on it, since it also accepts plain text input. You'll - need the ``-l `` option to indicate what language your text is in. - -Two additional commands are sometimes useful: ``readalongs tokenize`` and -``readalongs g2p``. - -- ``tokenize`` takes the output of ``make-xml`` and tokenizes it, wrapping each - word in the text in a ```` element. - -- ``g2p`` takes the output of ``tokenize`` and mapping each word to its - phonetic transcription using the g2p library. The phonetic transcription is - represented using the ARPABET phonetic codes and are added in the ``ARPABET`` - attribute to each ```` element. - -The result of ``tokenize`` or ``g2p`` can be fixed manually if necessary and -then used as input to ``align``. - -Getting from TXT to XML with readalongs make-xml -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Run :ref:`cli-make-xml` to make the ReadAlongs XML file for ``align`` from a TXT file. - -``readalongs make-xml [options] [story.txt] [story.readalong]`` - -``[story.txt]``: path to the plain text input file (TXT) - -``[story.readalong]``: Path to the XML output file - -The plain text file must be plain text encoded in ``UTF-8`` with one -sentence per line. Paragraph breaks are marked by a blank line, and page -breaks are marked by two blank lines. - -+-----------------------------------+-----------------------------------------------+ -| Key Options | Option descriptions | -+===================================+===============================================+ -| ``-l, --language(s)`` (required) | The language code for story.txt. | -| | Specifying multiple comma- or colon-separated | -| | languages triggers :ref:`g2p-cascade`. | -+-----------------------------------+-----------------------------------------------+ -| ``-f, --force-overwrite`` | Force overwrite output files | -| | (handy if you're troubleshooting | -| | and will be aligning repeatedly) | -+-----------------------------------+-----------------------------------------------+ -| ``-h, --help`` | Displays CLI guide for | -| | ``make-xml`` | -+-----------------------------------+-----------------------------------------------+ - -The ``-l, --language`` argument requires a language’s 3 character `ISO -code `__ as an argument. - -The languages supported by RAS can be listed by running ``readalongs make-xml -h`` -and they can also be found in the :ref:`cli-make-xml` reference. - -So, a full command for a story in Algonquin, with an implicit g2p fallback to -Undetermined, would be something like: - -``readalongs make-xml -l alq Studio/story.txt Studio/story.readalong`` - -The generated XML will be parsed in to sentences. At this stage you can -edit the XML to have any modifications, such as adding ``do-not-align`` -as an attribute of any element in your XML. - -The format of the generated XML is based on [TEI -Lite](https://tei-c.org/guidelines/customization/lite/) but is -considerably simplified. The DTD (document type definition) can be -found in the ReadAlong Studio source code under -`readalongs/static/read-along-1.0.dtd`. - -.. _dna: - -Handling mismatches: do-not-align -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -There are two types of "do-not-align" (DNA) content: DNA audio and DNA text. - -To use DNA text, add ``do-not-align`` as an attribute to any -element in the xml (word, sentence, paragraph, or page). - -:: - - dog - -If you have already run ``readalongs make-xml``, there will be -documentation for DNA text in comments at the beginning of the generated -xml file. - -:: - - - -To use DNA audio, you can specify a timeframe in milliseconds in the -``config.json`` file which you want the aligner to ignore. - -:: - - "do-not-align": - { - "method": "remove", - "segments": - [ - { - "begin": 1, - "end": 17000 - } - ] - } - -Use cases for DNA -''''''''''''''''' - -- Spoken introduction in the audio file that has no accompanying text - (DNA audio) -- Text that has no matching audio, such as credits/acknowledgments (DNA - text) - -Aligning your text and audio with readalongs align -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Run :ref:`cli-align` to align a text file (RAS or TXT) and an audio file to -create a time-aligned audiobook. - -``readalongs align [options] [story.txt/xml] [story.mp3/wav] [output_base]`` - -``[story.txt/ras]``: path to the text file (TXT or RAS) - -``[story.mp3/wav]``: path to the audio file (MP3, WAV or any format -supported by ffmpeg) - -``[output_base]``: path to the directory where the output files will be -created, as ``output_base*`` - -+-----------------------------------+-----------------------------------------------+ -| Key Options | Option descriptions | -+===================================+===============================================+ -| ``-l, --language(s)`` | The language code for story.txt. | -| | Specifying multiple comma- or colon-separated | -| | languages triggers :ref:`g2p-cascade`. | -| | (required if input is plain text) | -+-----------------------------------+-----------------------------------------------+ -| ``-c, --config PATH`` | Use ReadAlong-Studio | -| | configuration file (in JSON | -| | format) | -+-----------------------------------+-----------------------------------------------+ -| ``--debug-g2p`` | Display verbose g2p debugging messages | -+-----------------------------------+-----------------------------------------------+ -| ``-s, --save-temps`` | Save intermediate stages of | -| | processing and temporary files | -| | (dictionary, FSG, tokenization, | -| | etc.) | -+-----------------------------------+-----------------------------------------------+ -| ``-f, --force-overwrite`` | Force overwrite output files | -| | (handy if you’re troubleshooting | -| | and will be aligning repeatedly) | -+-----------------------------------+-----------------------------------------------+ -| ``-h, --help`` | Displays CLI guide for ``align`` | -+-----------------------------------+-----------------------------------------------+ - -See above for more information on the ``-l, --language`` argument. - -A full command could be something like: - -``readalongs align -f -c config.json story.readalong story.mp3 story-aligned`` - -**Is the text file plain text or XML?** - -``readalongs align`` accepts its text input as a plain text file or a ReadAlongs XML file. - -- If the file name ends with ``.txt``, it will be read as plain text. -- If the file name ends with ``.xml`` or ``.readalong``, it will be read as ReadAlongs XML. -- With other extensions, the beginning of the file is examined to - automatically determine if it's XML or plain text. - -Supported languages -~~~~~~~~~~~~~~~~~~~ - -The ``readalongs langs`` command can be used to list all supported languages. - -Here is that list at the time of compiling this documentation: - -.. command-output:: readalongs langs - -See :ref:`adding-a-lang` for references on adding new languages to that list. - - -Adding titles, images and do-not-align segments via the config.json file -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Some additional parameters can be specified via a config file: create -a JSON file called ``config.json``, possibly in the same folder as -your other ReadAlong input files for convenience. The config file -currently accepts a few components: adding titles and headers, adding -images to your ReadAlongs, and DNA audio (see :ref:`dna`). - -To add a title and headers to the output HTML, you can use the keys -`"title"`, `"header"`, and `"subheader"`, for example:: - - { - "title": "My awesome read-along", - "header": "A story in my language", - "subheader": "Read by me" - } - -To add images, indicate the page number as the key, and the name of the image -file as the value, as an entry in the ``"images"`` dictionary. - -:: - - { "images": { "0": "p1.jpg", "1": "p2.jpg" } } - -Both images and DNA audio can be specified in the same config file, such -as in the example below: - -:: - - { - "images": - { - "0": "image-for-page1.jpg", - "1": "image-for-page1.jpg", - "2": "image-for-page2.jpg", - "3": "image-for-page3.jpg" - }, - - "do-not-align": - { - "method": "remove", - "segments": - [ - { "begin": 1, "end": 17000 }, - { "begin": 57456, "end": 68000 } - ] - } - } - -Warning: mind your commas! The JSON format is very picky: commas -separate elements in a list or dictionnary, but if you accidentally have -a comma after the last element (e.g., by cutting and pasting whole -lines), you will get a syntax error. - -.. _g2p-cascade: - -The g2p cascade -~~~~~~~~~~~~~~~ - -Sometimes the g2p conversion of the input text will not succeed, for -various reasons. A word might use characters not recognized by the g2p mapping -for the language, or it might be in a different language. Whatever the -reason, the output for the g2p conversion will not be valid ARPABET, and -so the system will not be able to proceed to alignment by the -aligner, SoundSwallower. - -If you know the language for that text, you can mark it as such in the -XML. E.g.: - -.. code-block:: xml - - This sentence is in English. - -The ``xml:lang`` attribute can be added to any element in the XML structure -and will apply to text at any depth within that element, unless the -attribute is specified again at a deeper level, e.g.: - -.. code-block:: xml - - English mixed with français. - -There is also a simpler option available: the g2p cascade. When the g2p -cascade is enabled, the g2p mapping will be done by first trying the -language specified by the `xml:lang` attribute in the XML file -(or with the first language provided to the ``-l`` flag on the -command line, if the input is plain text). For each word where the -result is not valid ARPABET, the g2p mapping will be attempted again -with each of the languages specified in the g2p cascade, in order, until -a valid ARPABET conversion is obtained. If no valid conversion is -possible, are error message is printed and alignment is not attempted. - -To enable the g2p cascade, provide multiple languages via the ``-l`` switch -(for plain text input) or add the ``fallback-langs="l2,l3,...`` attribute to -any element in the XML file: - -.. code-block:: xml - - English mixed with français. - -These command line examples will set the language to ``fra``, with the g2p cascade -falling back to ``eng`` and then ``und`` (see below) when needed. - -.. code-block:: bash - - readalongs make-xml -l fra,eng myfile.txt myfile.readalong - readalongs align -l fra,eng myfile.txt myfile.wav output-dir - -The "Undetermined" language code: und -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Notice how the sample XML snippet above has ``und`` as the last language in the -cascade. ``und``, for Undetermined, is a special language mapping that -uses the definition of all characters in all alphabets that are part of the -Unicode standard, and -maps them as if the name of that character was how it is pronounced. -While crude, this mapping works surprisingly well for the purposes of -forced alignment, and allows ``readalongs align`` to successfully align -most text with a few foreign words without any manual intervention. - -Since we recommend systematically using ``und`` at the end of the cascade, it -is now added by default after the languages specified with the ``-l`` -switch to both ``readalongs align`` and ``readalongs make-xml``. Note that -adding other languages after ``und`` will have no effect, since the -Undetermined mapping will map any string to valid ARPABET. - -In the unlikely event that you want to disable adding ``und``, add the hidden -``--lang-no-append-und`` switch, or delete ``und`` from the ``fallback-langs`` -attribute in your XML input. - -Debugging g2p mapping issues -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The warning messages issued by ``readalongs g2p`` and ``readalongs align`` -indicate which words are causing g2p problems and what fallbacks were tried. -It can be worth inspecting to input text to fix any encoding or spelling -errors highlighted by these warnings. More detailed messages can be -produced by adding the ``--debug-g2p`` switch, to obtain a lot more -information about g2p'ing words in each language g2p was unsucessfully -attempted. - -Breaking up the pipeline -~~~~~~~~~~~~~~~~~~~~~~~~ - -Some commands were added to the CLI in the last year to break processing up step -by step. - -The following series of commands: - -:: - - readalongs make-xml -l l1,l2 file.txt file.readalong - readalongs tokenize file.readalong file.tokenized.readalong - readalongs g2p file.tokenized.readalong file.g2p.readalong - readalongs align file.g2p.readalong file.wav output - -is equivalent to the single command: - -:: - - readalongs align -l l1,l2 file.txt file.wav output - -except that when running the pipeline as four separate commands, you can -edit the XML files between each step to make manual adjustments and -corrections if you want, like inserting anchors, silences, changing the -language for indivual elements, or even manually editting the ARPABET encoding -for some words. - -Anchors: marking known alignment points -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Long audio/text file pairs can sometimes be difficult to align -correctly, because the aligner might get lost part way through the -alignment process. Anchors can be used to tell the aligner about known -correspondance points between the text and the audio stream. - -Anchor syntax -^^^^^^^^^^^^^ - -Anchors are inserted in the XML file (the output of -``readalongs make-xml``, ``readalongs tokenize`` or ``readalongs g2p``) -using the following syntax: ```` or -````. The time can be specified in seconds (this -is the default) or milliseconds. - -Anchors can be placed anywhere in the XML file: between/before/after any -element or text. - -Example: - -.. code-block:: xml - - - - -
-

- Hello. - - This is a test - weirdword -

-
- -
- -Anchor semantics -^^^^^^^^^^^^^^^^ - -When anchors are used, the alignment task is divided at each anchor, -creating a series of segments that are aligned independently from one -another. When alignment is performed, the aligner sees only the audio -and the text from the segment being processed, and the results are -joined together afterwards. - -The beginning and end of files are implicit anchors: *n* anchors define -*n+1* segments: from the beginning of the audio and text to the first -anchor, between pairs of anchors, and from the last anchor to the end of -the audio and text. - -Special cases equivalent to do-not-align audio: - -- If an anchor occurs before the first word in the text, the audio up to that - anchor’s timestamps is excluded from alignment. -- If an anchor occurs after the last word, the end of the audio is excluded - from alignment. -- If two anchors occur one after the other, the time span between them in the - audio is excluded from alignment. - -Using anchors to define do-not-align audio segments is effectively the same as -marking them as "do-not-align" in the ``config.json`` file, except that DNA -segments declared using anchors have a known alignment with respect to the -text, while the position of DNA segments declared in the config file are -inferred by the aligner. - -Anchor use cases -^^^^^^^^^^^^^^^^ - -1. Alignment fails because the stream is too long or too difficult to - align. - - When alignment fails, listen to the audio stream and try to identify - where some words you can pick up start or end. Even if you don’t - understand the language, there might be some words you’re able to - pick up and use as anchors to help the aligner. - -2. You already know where some words/sentences/paragraphs start or end, - because the data came with some partial alignment information. For - example, the data might come from an ELAN file with sentence - alignments. - - These known timestamps can be converted to anchors. - -Silences: inserting pause-like silences -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -There are times where you might want a read-along to pause at a particular -place for a specific time and resume again after. This can be accomplished by -inserting silences in your audio stream. You can do it manually by editing your -audio file ahead of time, but you can also have ``readalongs align`` insert the -silences for you. - -Silence syntax -^^^^^^^^^^^^^^ - -Silences are inserted in the audio stream wherever a ``silence`` element is -found in the XML input. -**TODO say something about how the silence placement determined.** -The syntax is like the anchor syntax: ```` or -````. Like anchors, silence elements can be inserted -anywhere. - -Example: - -.. code-block:: xml - - - - -
-

- Hello. - - After this pregnant pause, we'll pause - again before it's all over! -

- -
-
- -Silence use cases -^^^^^^^^^^^^^^^^^ - -1. Your read along has a title page that is not read out in the audio stream: - insert a silence at the beginning so that it stays on the first page for - the specified time. - **TODO: test that a silence before the first word really keeps the RA on the - first page during that silence, even if all text on the first page is DNA.** - -2. Your read along has a credits page at the end that is not read out in the - audio stream: insert a silence at the end so that people see that credits - page for the specified time before the streaming end. - **TODO: also test that this use case works as described.** diff --git a/docs/cli-ref.md b/docs/cli-ref.md new file mode 100644 index 00000000..2291a14c --- /dev/null +++ b/docs/cli-ref.md @@ -0,0 +1,53 @@ +(cli-ref)= + +# Command line interface (CLI) reference + +This page contains the full reference documentation for each command in the CLI. +See also {ref}`cli-guide` for guidelines on using the CLI. + +The ReadAlongs CLI has five key commands: + +- {ref}`cli-align`: full alignment pipeline, from plain text or XML to a + viewable readalong +- {ref}`cli-make-xml`: convert a plain text file into XML, for align +- {ref}`cli-tokenize`: tokenize an XML file +- {ref}`cli-g2p`: g2p a tokenized XML file +- {ref}`cli-langs`: list supported languages + +Each command can be run with `-h` or `--help` to display its usage manual, +e.g., `readalongs -h`, `readalongs align --help`. + +(cli-align)= + +```{eval-rst} +.. click:: readalongs.cli:align + :prog: readalongs align +``` + +(cli-make-xml)= + +```{eval-rst} +.. click:: readalongs.cli:make_xml + :prog: readalongs make-xml +``` + +(cli-tokenize)= + +```{eval-rst} +.. click:: readalongs.cli:tokenize + :prog: readalongs tokenize +``` + +(cli-g2p)= + +```{eval-rst} +.. click:: readalongs.cli:g2p + :prog: readalongs g2p +``` + +(cli-langs)= + +```{eval-rst} +.. click:: readalongs.cli:langs + :prog: readalongs langs +``` diff --git a/docs/cli-ref.rst b/docs/cli-ref.rst deleted file mode 100644 index 0b94f686..00000000 --- a/docs/cli-ref.rst +++ /dev/null @@ -1,39 +0,0 @@ -.. _cli-ref: - -Command line interface (CLI) reference -====================================== - -This page contains the full reference documentation for each command in the CLI. -See also :ref:`cli-guide` for guidelines on using the CLI. - -The ReadAlongs CLI has five key commands: - -- :ref:`cli-align`: full alignment pipeline, from plain text or XML to a - viewable readalong -- :ref:`cli-make-xml`: convert a plain text file into XML, for align -- :ref:`cli-tokenize`: tokenize an XML file -- :ref:`cli-g2p`: g2p a tokenized XML file -- :ref:`cli-langs`: list supported languages - -Each command can be run with ``-h`` or ``--help`` to display its usage manual, -e.g., ``readalongs -h``, ``readalongs align --help``. - -.. _cli-align: -.. click:: readalongs.cli:align - :prog: readalongs align - -.. _cli-make-xml: -.. click:: readalongs.cli:make_xml - :prog: readalongs make-xml - -.. _cli-tokenize: -.. click:: readalongs.cli:tokenize - :prog: readalongs tokenize - -.. _cli-g2p: -.. click:: readalongs.cli:g2p - :prog: readalongs g2p - -.. _cli-langs: -.. click:: readalongs.cli:langs - :prog: readalongs langs diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000..16f2d459 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,24 @@ +# Welcome to ReadAlong-Studio's documentation + +Audiobook alignment for Indigenous languages + +This site provides the full user documentation for ReadAlongs-Studio. + +```{toctree} +:caption: 'Contents:' +:maxdepth: 2 + +start +installation +cli-guide +cli-ref +outputs +advanced-use +troubleshooting +``` + +# Indices and tables + +- {ref}`genindex` +- {ref}`modindex` +- {ref}`search` diff --git a/docs/index.rst b/docs/index.rst deleted file mode 100644 index a99a1d92..00000000 --- a/docs/index.rst +++ /dev/null @@ -1,26 +0,0 @@ -Welcome to ReadAlong-Studio's documentation -=========================================== - -Audiobook alignment for Indigenous languages - -This site provides the full user documentation for ReadAlongs-Studio. - -.. toctree:: - :maxdepth: 2 - :caption: Contents: - - start - installation - cli-guide - cli-ref - outputs - advanced-use - troubleshooting - - -Indices and tables -================== - -* :ref:`genindex` -* :ref:`modindex` -* :ref:`search` diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 00000000..84bed1d7 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,5 @@ +(installation)= + +# Installation + +See [ReadAlongs/Studio/README.md](https://github.com/ReadAlongs/Studio#install) diff --git a/docs/installation.rst b/docs/installation.rst deleted file mode 100644 index 3245e0d6..00000000 --- a/docs/installation.rst +++ /dev/null @@ -1,6 +0,0 @@ -.. _installation: - -Installation -============ - -See `ReadAlongs/Studio/README.md `__ diff --git a/docs/outputs.md b/docs/outputs.md new file mode 100644 index 00000000..73c196fc --- /dev/null +++ b/docs/outputs.md @@ -0,0 +1,50 @@ +% outputs: + +# Output Realizations + +One of the main motivations for ReadAlong-Studio was to provide a one-stop-shop for audio/text alignment. +With that in mind, there are a variety of different output formats that can be created. Here are a few: + +## Elan/Praat files + +## Web Component + +When you have standard output from ReadAlong-Studio, consisting of 1) a ReadALong file (XML) and 2) an audio file +you can mobilize these files to the web or hybrid mobile apps quickly and painlessly. + +This is done using the ReadAlong WebComponent. Web components are re-useable, custom-defined HTML elements that you can embed in any HTML, regardless of which +framework you used to build your site, whether React, Angular, Vue, or just Vanilla HTML/CSS/JS. + +Below is an example of a minimal implementation in a basic standalone html page. Please visit for more information on framework integrations. + +```html + + + + + + + + + + + + + + + +``` + +The above assumes the following structure: + +web + +├── assets + +│ ├── sample.wav + +│ ├── sample.readalong + +├── index.html + +Then you can host your site anywhere, or run it locally (`cd web && python3 -m http.server` for example) diff --git a/docs/outputs.rst b/docs/outputs.rst deleted file mode 100644 index eaa338dc..00000000 --- a/docs/outputs.rst +++ /dev/null @@ -1,52 +0,0 @@ -.. outputs: - -Output Realizations -=================== - -One of the main motivations for ReadAlong-Studio was to provide a one-stop-shop for audio/text alignment. -With that in mind, there are a variety of different output formats that can be created. Here are a few: - -Elan/Praat files ----------------- - -Web Component -------------- - -When you have standard output from ReadAlong-Studio, consisting of 1) a ReadALong file (XML) and 2) an audio file -you can mobilize these files to the web or hybrid mobile apps quickly and painlessly. - -This is done using the ReadAlong WebComponent. Web components are re-useable, custom-defined HTML elements that you can embed in any HTML, regardless of which -framework you used to build your site, whether React, Angular, Vue, or just Vanilla HTML/CSS/JS. - -Below is an example of a minimal implementation in a basic standalone html page. Please visit https://stenciljs.com/docs/overview for more information on framework integrations. - -.. code-block:: html - - - - - - - - - - - - - - - - - - -The above assumes the following structure: - -| web -| ├── assets -| │ ├── sample.wav -| │ ├── sample.readalong -| ├── index.html -| -| - -Then you can host your site anywhere, or run it locally (``cd web && python3 -m http.server`` for example) diff --git a/docs/start.md b/docs/start.md new file mode 100644 index 00000000..cbb7d765 --- /dev/null +++ b/docs/start.md @@ -0,0 +1,41 @@ +% start: + +# Getting Started + +This library is an end-to-end audio/text aligner. It is meant to be used +together with the ReadAlong-Web-Component to interactively visualize the +alignment. + +## Background + +The concept is a web application with a series of stages of processing, +which ultimately leads to a time-aligned audiobook, i.e., a package of: + +- ReadAlong XML file describing text +- Audio file (WAV or MP3) +- HTML file describing the web component + +Which can be loaded using the [read-along web +component](https://github.com/roedoejet/ReadAlong-Web-Component). + +A book is generated as a standalone HTML page by default, but can +optionally be generated as an ePub file. + +## Required knowledge + +- How to use a [Command-line interface (CLI)](https://en.wikipedia.org/wiki/Command-line_interface). +- How to edit and manipulate plain text, [XML](https://www.w3.org/standards/xml/core) and [SMIL](https://www.w3.org/TR/smil/) files using a text editor or a code editor. +- How to edit and examine an audio file with [Audacity](https://www.audacityteam.org/) or similar software. +- How to spin up a local web server (e.g., see [How do you set up a local testing server?](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/set_up_a_local_testing_server)) + +## What you need to make a ReadAlong + +In order to create a ReadAlong you will need two files: + +- A text file, either in plain text (`.txt`) or in ReadAlong XML (`.readalong`) +- Clear audio in any format supported by [ffmpeg](https://ffmpeg.org/ffmpeg-formats.html) + +The content of the text file should be a transcription of the audio +file. The audio can be spoken or sung, but if there is background music +or noise of any kind, the aligner is likely to fail. Clearly enunciated +audio is also likely to increase accuracy. diff --git a/docs/start.rst b/docs/start.rst deleted file mode 100644 index e3274d07..00000000 --- a/docs/start.rst +++ /dev/null @@ -1,45 +0,0 @@ -.. start: - -Getting Started -================ - -This library is an end-to-end audio/text aligner. It is meant to be used -together with the ReadAlong-Web-Component to interactively visualize the -alignment. - -Background ----------- - -The concept is a web application with a series of stages of processing, -which ultimately leads to a time-aligned audiobook, i.e., a package of: - -- ReadAlong XML file describing text -- Audio file (WAV or MP3) -- HTML file describing the web component - -Which can be loaded using the `read-along web -component `__. - -A book is generated as a standalone HTML page by default, but can -optionally be generated as an ePub file. - -Required knowledge ------------------- - -- How to use a `Command-line interface (CLI) `__. -- How to edit and manipulate plain text, `XML `__ and `SMIL `__ files using a text editor or a code editor. -- How to edit and examine an audio file with `Audacity `__ or similar software. -- How to spin up a local web server (e.g., see `How do you set up a local testing server? `__) - -What you need to make a ReadAlong ---------------------------------- - -In order to create a ReadAlong you will need two files: - -- A text file, either in plain text (``.txt``) or in ReadAlong XML (``.readalong``) -- Clear audio in any format supported by `ffmpeg `__ - -The content of the text file should be a transcription of the audio -file. The audio can be spoken or sung, but if there is background music -or noise of any kind, the aligner is likely to fail. Clearly enunciated -audio is also likely to increase accuracy. diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.md similarity index 58% rename from docs/troubleshooting.rst rename to docs/troubleshooting.md index ae79310d..6df8c32f 100644 --- a/docs/troubleshooting.rst +++ b/docs/troubleshooting.md @@ -1,22 +1,31 @@ -.. _troubleshooting: +--- +substitutions: + image1: |- + ```{image} https://i.imgur.com/vKPhTud.png + ``` +--- -.. note:: This troubleshooting guide is under construction. +(troubleshooting)= -Troubleshooting -=============== +:::{note} +This troubleshooting guide is under construction. +::: + +# Troubleshooting Here are three types of common errors you may encounter when trying to run ReadAlongs, and ways to debug them. -Phones missing in the acoustic model ------------------------------------- +## Phones missing in the acoustic model -.. note:: Troubleshooting item under construction +:::{note} +Troubleshooting item under construction +::: -You may get an error that looks like this:|image1| +You may get an error that looks like this:{{ image1 }} The general structure of your error would look like -``Phone [character] is missing in the acoustic model; word [index] ignored`` +`Phone [character] is missing in the acoustic model; word [index] ignored` This error is most likely caused not by a bug in your ReadAlong input files, but by an error in one of your g2p mappings. The error message is saying that there is a character in your ReadAlong text that is not @@ -29,16 +38,16 @@ Follow these steps to debug the issue **in g2p**. 1. Identify which characters in each line of the error message are **not** being converted to eng-arpabet. These will either be: - a. characters that are not in caps (for example ``g`` in the string - ``gUW`` in the error message shown above.) - b. a character not traditionally used in English (for example é or Ŧ, - or ``ʰ`` in the error message shown above.) You can confirm you + 1. characters that are not in caps (for example `g` in the string + `gUW` in the error message shown above.) + 2. a character not traditionally used in English (for example é or Ŧ, + or `ʰ` in the error message shown above.) You can confirm you have isolated the right characters by ensuring every other character in your error message appears as an **output** in the - `eng-ipa-to-arpabet - mapping `__. + [eng-ipa-to-arpabet + mapping](https://github.com/roedoejet/g2p/blob/main/g2p/mappings/langs/eng/eng_ipa_to_arpabet.json). These are the problematic characters we need to debug in the error - message shown above: ``g`` and ``ʰ``. + message shown above: `g` and `ʰ`. 2. Once you have isolated the characters that are not being converted to eng-arpabet, you are ready to begin debugging the issue. Start at @@ -48,55 +57,55 @@ Follow these steps to debug the issue **in g2p**. problematic characters incorrectly. Most of the time, the issue will be in either the first or the second of the following mappings: - i. *xyz-ipa* (where xyz is the ISO language code for your mapping) - ii. *xyz-equiv* (if you have one) - iii. *xyz-ipa_to_eng-ipa* (this mapping must be generated - automatically in g2p. Refer //here_in_the_guide to see how to do - this.) - iv. `eng-ipa-to-arpabet - mapping `__ - (The issue is rarely found here, but it doesn’t hurt to check.) + 1. *xyz-ipa* (where xyz is the ISO language code for your mapping) + 2. *xyz-equiv* (if you have one) + 3. *xyz-ipa_to_eng-ipa* (this mapping must be generated + automatically in g2p. Refer //here_in_the_guide to see how to do + this.) + 4. [eng-ipa-to-arpabet + mapping](https://github.com/roedoejet/g2p/blob/main/g2p/mappings/langs/eng/eng_ipa_to_arpabet.json) + (The issue is rarely found here, but it doesn’t hurt to check.) 4. Find a word in your text that uses the problematic character. For the - sake of example, let us assume the character I am debugging is ``g``, + sake of example, let us assume the character I am debugging is `g`, that appears in the word "dog", in language "xyz". 5. Make sure you are in the g2p repository and run the word through - ``g2p convert`` to confirm you have isolated the correct characters - to debug: ``g2p convert dog xyz eng-arpabet``. Best practice is to + `g2p convert` to confirm you have isolated the correct characters + to debug: `g2p convert dog xyz eng-arpabet`. Best practice is to copy+paste the word directly from your text instead of retyping it. Make sure to use the ISO code for your language in place of "xyz". *If the word converts cleanly into eng-arpabet characters, your issue does not lie in your mapping. //Refer to other potential RA issues* 6. From the result of the command run in 5, note the characters that do - **not** appear as **inputs** in the `eng-ipa-to-arpabet - mapping `__. + **not** appear as **inputs** in the [eng-ipa-to-arpabet + mapping](https://github.com/roedoejet/g2p/blob/main/g2p/mappings/langs/eng/eng_ipa_to_arpabet.json). These are the characters that have not been converted into characters that eng-ipa-to-arpabet can read. These should be the same characters you identified in step 2. -7. Run ``g2p convert dog xyz xyz-ipa``. Ensure the result is what you +7. Run `g2p convert dog xyz xyz-ipa`. Ensure the result is what you expect. If not, your error may arise from a problem in this mapping. refer_to_g2p_troubleshooting. If the result is what you expect, continue to the next step. 8. Note the result from running the command in 7. Check that the - characters [TODO-fix this text] (appear/being mapped by generated -- + characters \[TODO-fix this text\] (appear/being mapped by generated -- use debugger or just look at mapping) -.. |image1| image:: https://i.imgur.com/vKPhTud.png - -Type 2 ------- +## Type 2 -.. note:: TODO +:::{note} +TODO +::: Common error type 2... -Type 3 ------- +## Type 3 -.. note:: TODO +:::{note} +TODO +::: Common error type 3... From 191e1fbe292c413c8586a6ee90920bc26b9e419f Mon Sep 17 00:00:00 2001 From: Eric Joanis Date: Thu, 20 Jun 2024 13:54:54 -0400 Subject: [PATCH 2/5] refactor(docs): configure mkdocs and fix the .md files for it Also remove the now obsolete .readthedocs.yaml --- .readthedocs.yml | 18 --------- docs/Contributing.md | 27 +++++--------- docs/advanced-use.md | 22 +++++------ docs/cli-guide.md | 22 +++++------ docs/cli-ref.md | 72 +++++++++++++++--------------------- docs/index.md | 21 +---------- docs/installation.md | 2 - docs/outputs.md | 27 +++++++------- docs/requirements.txt | 10 +++-- docs/start.md | 6 +-- docs/troubleshooting.md | 82 +++++++++++++++-------------------------- mkdocs.yml | 31 ++++++++++++++++ 12 files changed, 142 insertions(+), 198 deletions(-) delete mode 100644 .readthedocs.yml create mode 100644 mkdocs.yml diff --git a/.readthedocs.yml b/.readthedocs.yml deleted file mode 100644 index 4660f296..00000000 --- a/.readthedocs.yml +++ /dev/null @@ -1,18 +0,0 @@ -version: 2 - -build: - os: ubuntu-20.04 - tools: - python: "3.8" - jobs: - post_install: - - echo "Installing Studio itself in its current state" - - which pip python - - pip install -e . - -sphinx: - configuration: docs/conf.py - -python: - install: - - requirements: docs/requirements.txt diff --git a/docs/Contributing.md b/docs/Contributing.md index 2b4cb723..a1da745e 100644 --- a/docs/Contributing.md +++ b/docs/Contributing.md @@ -2,38 +2,31 @@ ## Edit the files -To contribute to the ReadAlongs Studio documentation, edit the `.rst` files in +To contribute to the ReadAlongs Studio documentation, edit the `.md` files in this folder. +The configuration is found in `../mkdocs.yml`. + ## Build and view the documentation locally To build the documentation and review your own changes locally: -1. Install the required build software, Sphinx: +1. Install the required build software, mkdocs and friends: - pip install -r requirements.txt + pip install -r requirements.txt 2. Install Studio itself - (cd .. && pip install -e .) - -3. Run one of these commands, which will build the documentation in `./_build/html/` - or `./_build/singlehtml/`: - - make html # multi-page HTML site - make singlehtml # single-page HTML document + (cd .. && pip install -e .) -2. View the documentation by running an HTTP server in the directory where the - build is found, e.g., +3. Run this command to serve the documentation locally: - cd _build/html - python3 -m http.server + (cd .. && mkdocs serve) - and navigating to http://127.0.0.1:8000 (or whatever port your local web - server displays). +4. View the documentation by browing to . ## Publish the changes Once your changes are pushed to GitHub and merged into `main` via a Pull Request, the documentation will automatically get built and published to -https://readalong-studio.readthedocs.io/en/latest/ + diff --git a/docs/advanced-use.md b/docs/advanced-use.md index f010c094..0d925878 100644 --- a/docs/advanced-use.md +++ b/docs/advanced-use.md @@ -1,18 +1,16 @@ -(advanced-use)= - # Advanced topics -(adding-a-lang)= - ## Adding a new language to g2p If you want to align an audio book in a language that is not yet supported by the g2p library, you will have to write your own g2p mapping for that language. References: -: - The [g2p library](https://github.com/roedoejet/g2p) and its - [documentation](https://g2p.readthedocs.io/). - - The [7-part blog post on creating g2p mappings](https://blog.mothertongues.org/g2p-background/) on the [Mother Tongues Blog](https://blog.mothertongues.org/). + + - The [g2p library](https://github.com/roedoejet/g2p) and its + [documentation](https://roedoejet.github.io/g2p). + - The [7-part blog post on creating g2p mappings](https://blog.mothertongues.org/g2p-background/) + on the [Mother Tongues Blog](https://blog.mothertongues.org/). Once you have created a g2p mapping for your language, please consider [contributing it to the project](https://blog.mothertongues.org/g2p-contributing/) @@ -38,7 +36,7 @@ pip-installed. Keep in mind that Pydub uses milliseconds. If your data is currently 1 audio file, you will need to split it into segments where you want to put the silences. -``` +```py ten_seconds = 10 * 1000 first_10_seconds = soundtrack[:ten_seconds] last_5_seconds = soundtrack[-5000:] @@ -47,7 +45,7 @@ last_5_seconds = soundtrack[-5000:] Once you have your segments, create an MP3 file containing only 1 second of silence. -``` +```py from pydub import AudioSegment wfile = "appended_1000ms.mp3" @@ -57,7 +55,7 @@ soundtrack = silence Then you loop the audio files you want to append (segments and silence). -``` +```py seg = AudioSegment.from_mp3(mp3file) soundtrack = soundtrack + silence + seg ``` @@ -65,7 +63,7 @@ soundtrack = soundtrack + silence + seg Write the soundtrack file as an MP3. This will then be the audio input for your Read-Along. -``` +```py soundtrack.export(wfile, format="mp3") ``` @@ -83,7 +81,7 @@ of their supported languages), consider adding a library like [num2words](https://github.com/savoirfairelinux/num2words) to your pre-processing. -``` +```txt num2words 123456789 one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine ``` diff --git a/docs/cli-guide.md b/docs/cli-guide.md index a335fb91..5c0f1583 100644 --- a/docs/cli-guide.md +++ b/docs/cli-guide.md @@ -1,9 +1,7 @@ -(cli-guide)= - # Command line interface (CLI) user guide This page contains guidelines on using the ReadAlongs CLI. See also -{ref}`cli-ref` for the full CLI reference. +[Command line interface (CLI) reference ](cli-ref.md) for the full CLI reference. The ReadAlongs CLI has two main commands: `readalongs make-xml` and `readalongs align`. @@ -32,7 +30,7 @@ then used as input to `align`. ## Getting from TXT to XML with readalongs make-xml -Run {ref}`cli-make-xml` to make the ReadAlongs XML file for `align` from a TXT file. +Run [`readalongs make-xml`][readalongs-make-xml] to make the ReadAlongs XML file for `align` from a TXT file. `readalongs make-xml [options] [story.txt] [story.readalong]` @@ -46,7 +44,7 @@ breaks are marked by two blank lines. | Key Options | Option descriptions | | ------------------------------ | --------------------------------------------------------------------------------------------------------------------- | -| `-l, --language(s)` (required) | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers {ref}`g2p-cascade`. | +| `-l, --language(s)` (required) | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers the [g2p cascade][the-g2p-cascade]. | | `-f, --force-overwrite` | Force overwrite output files (handy if you're troubleshooting and will be aligning repeatedly) | | `-h, --help` | Displays CLI guide for `make-xml` | @@ -54,7 +52,7 @@ The `-l, --language` argument requires a language’s 3 character [ISO code](https://en.wikipedia.org/wiki/ISO_639-3) as an argument. The languages supported by RAS can be listed by running `readalongs make-xml -h` -and they can also be found in the {ref}`cli-make-xml` reference. +and they can also be found in the [`readalongs make-xml`][readalongs-make-xml] reference. So, a full command for a story in Algonquin, with an implicit g2p fallback to Undetermined, would be something like: @@ -65,8 +63,8 @@ The generated XML will be parsed in to sentences. At this stage you can edit the XML to have any modifications, such as adding `do-not-align` as an attribute of any element in your XML. -The format of the generated XML is based on \[TEI -Lite\]() but is +The format of the generated XML is based on [TEI +Lite](https://tei-c.org/guidelines/customization/lite/) but is considerably simplified. The DTD (document type definition) can be found in the ReadAlong Studio source code under `readalongs/static/read-along-1.0.dtd`. @@ -120,7 +118,7 @@ To use DNA audio, you can specify a timeframe in milliseconds in the ## Aligning your text and audio with readalongs align -Run {ref}`cli-align` to align a text file (RAS or TXT) and an audio file to +Run [`readalongs align`][readalongs-align] to align a text file (RAS or TXT) and an audio file to create a time-aligned audiobook. `readalongs align [options] [story.txt/xml] [story.mp3/wav] [output_base]` @@ -135,7 +133,7 @@ created, as `output_base*` | Key Options | Option descriptions | | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `-l, --language(s)` | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers {ref}`g2p-cascade`. (required if input is plain text) | +| `-l, --language(s)` | The language code for story.txt. Specifying multiple comma- or colon-separated languages triggers the [g2p cascade][the-g2p-cascade]. (required if input is plain text) | | `-c, --config PATH` | Use ReadAlong-Studio configuration file (in JSON format) | | `--debug-g2p` | Display verbose g2p debugging messages | | `-s, --save-temps` | Save intermediate stages of processing and temporary files (dictionary, FSG, tokenization, etc.) | @@ -167,7 +165,7 @@ Here is that list at the time of compiling this documentation: .. command-output:: readalongs langs ``` -See {ref}`adding-a-lang` for references on adding new languages to that list. +See [Adding a new language to g2p][adding-a-new-language-to-g2p] for references on adding new languages to that list. ## Adding titles, images and do-not-align segments via the config.json file @@ -225,8 +223,6 @@ separate elements in a list or dictionnary, but if you accidentally have a comma after the last element (e.g., by cutting and pasting whole lines), you will get a syntax error. -(g2p-cascade)= - ## The g2p cascade Sometimes the g2p conversion of the input text will not succeed, for diff --git a/docs/cli-ref.md b/docs/cli-ref.md index 2291a14c..7c2f3011 100644 --- a/docs/cli-ref.md +++ b/docs/cli-ref.md @@ -1,53 +1,41 @@ -(cli-ref)= - # Command line interface (CLI) reference This page contains the full reference documentation for each command in the CLI. -See also {ref}`cli-guide` for guidelines on using the CLI. +See also [Command line interface (CLI) user guide](cli-guide.md) for guidelines on using the CLI. The ReadAlongs CLI has five key commands: -- {ref}`cli-align`: full alignment pipeline, from plain text or XML to a +- [`readalongs align`][readalongs-align]: full alignment pipeline, from plain text or XML to a viewable readalong -- {ref}`cli-make-xml`: convert a plain text file into XML, for align -- {ref}`cli-tokenize`: tokenize an XML file -- {ref}`cli-g2p`: g2p a tokenized XML file -- {ref}`cli-langs`: list supported languages +- [`readalongs make-xml`][readalongs-make-xml]: convert a plain text file into XML, for align +- [`readalongs tokenize`][readalongs-tokenize]: tokenize an XML file +- [`readalongs g2p`][readalongs-g2p]: g2p a tokenized XML file +- [`readalongs langs`][readalongs-langs]: list supported languages Each command can be run with `-h` or `--help` to display its usage manual, e.g., `readalongs -h`, `readalongs align --help`. -(cli-align)= - -```{eval-rst} -.. click:: readalongs.cli:align - :prog: readalongs align -``` - -(cli-make-xml)= - -```{eval-rst} -.. click:: readalongs.cli:make_xml - :prog: readalongs make-xml -``` - -(cli-tokenize)= - -```{eval-rst} -.. click:: readalongs.cli:tokenize - :prog: readalongs tokenize -``` - -(cli-g2p)= - -```{eval-rst} -.. click:: readalongs.cli:g2p - :prog: readalongs g2p -``` - -(cli-langs)= - -```{eval-rst} -.. click:: readalongs.cli:langs - :prog: readalongs langs -``` +::: mkdocs-click + :module: readalongs.cli + :command: align + :prog_name: readalongs align + +::: mkdocs-click + :module: readalongs.cli + :command: make_xml + :prog_name: readalongs make-xml + +::: mkdocs-click + :module: readalongs.cli + :command: tokenize + :prog_name: readalongs tokenize + +::: mkdocs-click + :module: readalongs.cli + :command: g2p + :prog_name: readalongs g2p + +::: mkdocs-click + :module: readalongs.cli + :command: langs + :prog_name: readalongs langs diff --git a/docs/index.md b/docs/index.md index 16f2d459..749f2478 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,23 +2,4 @@ Audiobook alignment for Indigenous languages -This site provides the full user documentation for ReadAlongs-Studio. - -```{toctree} -:caption: 'Contents:' -:maxdepth: 2 - -start -installation -cli-guide -cli-ref -outputs -advanced-use -troubleshooting -``` - -# Indices and tables - -- {ref}`genindex` -- {ref}`modindex` -- {ref}`search` +This site provides the user documentation for ReadAlongs-Studio. diff --git a/docs/installation.md b/docs/installation.md index 84bed1d7..b530c237 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,5 +1,3 @@ -(installation)= - # Installation See [ReadAlongs/Studio/README.md](https://github.com/ReadAlongs/Studio#install) diff --git a/docs/outputs.md b/docs/outputs.md index 73c196fc..86093136 100644 --- a/docs/outputs.md +++ b/docs/outputs.md @@ -1,16 +1,12 @@ -% outputs: - # Output Realizations One of the main motivations for ReadAlong-Studio was to provide a one-stop-shop for audio/text alignment. With that in mind, there are a variety of different output formats that can be created. Here are a few: -## Elan/Praat files - ## Web Component -When you have standard output from ReadAlong-Studio, consisting of 1) a ReadALong file (XML) and 2) an audio file -you can mobilize these files to the web or hybrid mobile apps quickly and painlessly. +The standard output from ReadAlong-Studio consists of 1) a ReadALong file (XML) and 2) an audio file, +which you can mobilize to the web or hybrid mobile apps quickly and painlessly. This is done using the ReadAlong WebComponent. Web components are re-useable, custom-defined HTML elements that you can embed in any HTML, regardless of which framework you used to build your site, whether React, Angular, Vue, or just Vanilla HTML/CSS/JS. @@ -20,7 +16,6 @@ Below is an example of a minimal implementation in a basic standalone html page. ```html - @@ -31,20 +26,26 @@ Below is an example of a minimal implementation in a basic standalone html page. - +