Skip to content

Commit

Permalink
Merge pull request #223 from ReadAlongs/dev.ej/mkdocs
Browse files Browse the repository at this point in the history
Convert the documentation from sphinx .rst to mkdocs .md

Fixes #213
  • Loading branch information
joanise committed Jun 21, 2024
2 parents a5f5368 + f3d2f05 commit 2e44e50
Show file tree
Hide file tree
Showing 22 changed files with 825 additions and 887 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Deploy docs
on:
push:
branches:
- main
jobs:
docs:
# Create latest docs
runs-on: ubuntu-latest
permissions:
contents: write # to push to the gh-pages branch
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # needed to get the gh-pages branch
- uses: actions/setup-python@v5
with:
python-version: "3.8"
- name: Install dependencies and Studio
run: |
python -m pip install --upgrade pip
pip install wheel
pip install -r docs/requirements.txt -e .
- name: Setup doc deploy
run: |
git config user.name 'github-actions[bot]'
git config user.email 'github-actions[bot]@users.noreply.github.com'
- name: Deploy docs with mike 🚀
run: |
mike deploy --push --update-aliases dev latest
18 changes: 0 additions & 18 deletions .readthedocs.yml

This file was deleted.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![Deploy web-api](https://img.shields.io/badge/%E2%86%91_Deploy_to-Heroku-7056bf.svg)](https://readalong-studio.herokuapp.com/api/v1/docs)
[![GitHub license](https://img.shields.io/github/license/ReadAlongs/Studio)](https://github.com/ReadAlongs/Studio/blob/main/LICENSE)
[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg)](https://github.com/ReadAlongs/Studio)
[![Documentation Status](https://readthedocs.org/projects/readalong-studio/badge/)](https://readalong-studio.readthedocs.io)
[![Documentation](https://github.com/ReadAlongs/studio/actions/workflows/docs.yml/badge.svg)](https://readalongs.github.io/Studio/)

> Audiobook alignment for Indigenous languages!
Expand Down
32 changes: 32 additions & 0 deletions docs/Contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Contributing to the documentation

## Edit the files

To contribute to the ReadAlongs Studio documentation, edit the `.md` files in
this folder.

The configuration is found in `../mkdocs.yml`.

## Build and view the documentation locally

To build the documentation and review your own changes locally:

1. Install the required build software, mkdocs and friends:

pip install -r requirements.txt

2. Install Studio itself

(cd .. && pip install -e .)

3. Run this command to serve the documentation locally:

(cd .. && mkdocs serve)

4. View the documentation by browing to <http://localhost:8000>.

## Publish the changes

Once your changes are pushed to GitHub and merged into `main` via a Pull
Request, the documentation will automatically get built and published to
<https://readalong-studio.readthedocs.io/en/latest/>
39 changes: 0 additions & 39 deletions docs/README.md

This file was deleted.

83 changes: 37 additions & 46 deletions docs/advanced-use.rst → docs/advanced-use.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,75 @@
.. _advanced-use:
# Advanced topics

Advanced topics
===============

.. _adding-a-lang:

Adding a new language to g2p
----------------------------
## Adding a new language to g2p

If you want to align an audio book in a language that is not yet supported by
the g2p library, you will have to write your own g2p mapping for that language.

References:
- The `g2p library <https://github.com/roedoejet/g2p>`__ and its
`documentation <https://g2p.readthedocs.io/>`__.
- The `7-part blog post on creating g2p mappings <https://blog.mothertongues.org/g2p-background/>`__ on the `Mother Tongues Blog <https://blog.mothertongues.org/>`__.

- The [g2p library](https://github.com/roedoejet/g2p) and its
[documentation](https://roedoejet.github.io/g2p).
- The [7-part blog post on creating g2p mappings](https://blog.mothertongues.org/g2p-background/)
on the [Mother Tongues Blog](https://blog.mothertongues.org/).

Once you have created a g2p mapping for your language, please consider
`contributing it to the project <https://blog.mothertongues.org/g2p-contributing/>`__
[contributing it to the project](https://blog.mothertongues.org/g2p-contributing/)
so others can also benefit from your work!

Pre-processing your data
------------------------
## Pre-processing your data

Manipulating the text and/or audio data that you are trying to align can
sometimes produce longer, more accurate ReadAlongs, that throw less
errors when aligning. While some of the most successful techniques we
have tried are outlined here, you may also need to customize your
pre-processing to suit your specific data.

Audio pre-processing
~~~~~~~~~~~~~~~~~~~~
### Audio pre-processing

Adding silences
^^^^^^^^^^^^^^^
#### Adding silences

Adding 1 second segments of silence in between phrases or paragraphs
sometimes improves the performance of the aligner. We do this using the
`Pydub <https://github.com/jiaaro/pydub>`__ library which can be
[Pydub](https://github.com/jiaaro/pydub) library which can be
pip-installed. Keep in mind that Pydub uses milliseconds.

If your data is currently 1 audio file, you will need to split it into
segments where you want to put the silences.

::

ten_seconds = 10 * 1000
first_10_seconds = soundtrack[:ten_seconds]
last_5_seconds = soundtrack[-5000:]
```py
ten_seconds = 10 * 1000
first_10_seconds = soundtrack[:ten_seconds]
last_5_seconds = soundtrack[-5000:]
```

Once you have your segments, create an MP3 file containing only 1 second
of silence.

::
```py
from pydub import AudioSegment

from pydub import AudioSegment

wfile = "appended_1000ms.mp3"
silence = AudioSegment.silent(duration=1000)
soundtrack = silence
wfile = "appended_1000ms.mp3"
silence = AudioSegment.silent(duration=1000)
soundtrack = silence
```

Then you loop the audio files you want to append (segments and silence).

::

seg = AudioSegment.from_mp3(mp3file)
soundtrack = soundtrack + silence + seg
```py
seg = AudioSegment.from_mp3(mp3file)
soundtrack = soundtrack + silence + seg
```

Write the soundtrack file as an MP3. This will then be the audio input
for your Read-Along.

::
```py
soundtrack.export(wfile, format="mp3")
```

soundtrack.export(wfile, format="mp3")
### Text pre-processing

Text pre-processing
~~~~~~~~~~~~~~~~~~~

Numbers
^^^^^^^
#### Numbers

ReadAlong Studio cannot align numbers written as digits (ex. "123").
Instead, you will need to write them out (ex. "one two three" or "one
Expand All @@ -87,10 +78,10 @@ file.

If you have lots of data, and the numbers are spoken in English (or any
of their supported languages), consider adding a library like
`num2words <https://github.com/savoirfairelinux/num2words>`__ to your
[num2words](https://github.com/savoirfairelinux/num2words) to your
pre-processing.

::

num2words 123456789
one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine
```txt
num2words 123456789
one hundred and twenty-three million, four hundred and fifty-six thousand, seven hundred and eighty-nine
```
Loading

0 comments on commit 2e44e50

Please sign in to comment.