Our project is to create an augmented ML approach course creators can use to streamline the generation of lecture summaries and chapter markers based on lesson videos.
Tip
Do checkout out video demo showcasing Course Co-pilot.
The basic workflow is:
- User opens a link to a YouTube video lecture in our application and asks Course Co-Pilot to process it
- User can view status of requests via the “Get Predictions” button.
- User can view predicted topic boundaries, headlines, & content summaries for processed videos.
- User can correct and save generated content (planned later to use in data flywheel)
- User will be able to export results as chapter markers to use in YouTube(planned later)
- User will be able to export results in a quarto friendly format for posting to a web page or blog.(planned later)
In our own experience, we have noticed that such content either doesn’t get done, is time consuming, and/or requires work from outside parties. In particular, we noted in the below courses we’ve been a part of:
-
Fast.ai course - During the course students manually create youtube chapter markers, lesson transcripts, and summaries on the forums.
-
FSDL course - The chapter markers and lesson notes are later created manually and then shared on the FSDL website usually 1 week after the each lesson.
Let’s look at the dataset, ML library, API, and web application we created for our prototype system
Since we had to train summarization models and topic segmentation models, we manually created our dataset from a bunch of youtube videos ranging from videos from fastai lessons, FSDL lesson to random videos teaching something.
We leveraged nbdev framework to create a python package which acted as our framework for Model training and model serving. We integrated Wandb for experiment tracking and fine tuning models with sweeps. We created Model trainers for task of topic segmentation and summarization. The timing of our project coincided with release whisper which we used for creating transcription of youtube video URL you are passing. This helps to provide the required data for creating topic segments and summaries.
For the backend, we used FastAPI for creating APIs. Our API is leveraging dagster as the workflow engine to create tasks for running inference jobs from creating transcripts of video with whisper, running topic segmentation and running the summarization models.
We created our front-end web application using Vue3 and Quasar. It is deployed to github pages from our repo.
- Improve quality of training data
- Allow users to save their corrected headlines and summaries
- Add ability for users to update topic spans
- Implement data flywheel
- Implement chapter marker and quarto export features
- Add authentication/authorization
pip install course_copilot
Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website
After cloning the repo, create a conda environment. This will install nbdev alongside other libraries likely required for this project.
mamba env create -f environment.yml
nbdev_install_quarto
nbdev_install_hooks
If using VSCode, you can install pre-commit hooks “to catch and fix uncleaned and unexported notebooks” before pushing to get. See the instructions in the nbdev documentation if you want to use this feature. https://nbdev.fast.ai/tutorials/pre_commit.html
pip install -e '.[dev]'