fsdl_2022_course_project

Our project is to create an augmented ML approach course creators can use to streamline the generation of lecture summaries and chapter markers based on lesson videos.

Tip

Do checkout out video demo showcasing Course Co-pilot.

The basic workflow is:

User opens a link to a YouTube video lecture in our application and asks Course Co-Pilot to process it
User can view status of requests via the “Get Predictions” button.
User can view predicted topic boundaries, headlines, & content summaries for processed videos.
User can correct and save generated content (planned later to use in data flywheel)
User will be able to export results as chapter markers to use in YouTube(planned later)
User will be able to export results in a quarto friendly format for posting to a web page or blog.(planned later)

Why

In our own experience, we have noticed that such content either doesn’t get done, is time consuming, and/or requires work from outside parties. In particular, we noted in the below courses we’ve been a part of:

Fast.ai course - During the course students manually create youtube chapter markers, lesson transcripts, and summaries on the forums.
FSDL course - The chapter markers and lesson notes are later created manually and then shared on the FSDL website usually 1 week after the each lesson.

How our application is structured?

What have we done so far?

Let’s look at the dataset, ML library, API, and web application we created for our prototype system

Dataset

Since we had to train summarization models and topic segmentation models, we manually created our dataset from a bunch of youtube videos ranging from videos from fastai lessons, FSDL lesson to random videos teaching something.

Dataset Link

ML library: course_copilot

We leveraged nbdev framework to create a python package which acted as our framework for Model training and model serving. We integrated Wandb for experiment tracking and fine tuning models with sweeps. We created Model trainers for task of topic segmentation and summarization. The timing of our project coincided with release whisper which we used for creating transcription of youtube video URL you are passing. This helps to provide the required data for creating topic segments and summaries.

fsdl_2022_course_project

Backend API

For the backend, we used FastAPI for creating APIs. Our API is leveraging dagster as the workflow engine to create tasks for running inference jobs from creating transcripts of video with whisper, running topic segmentation and running the summarization models.

fsdl-2022-group-007-app

Web Application

We created our front-end web application using Vue3 and Quasar. It is deployed to github pages from our repo.

fsdl-2022-group-007-web

Future Plans

Improve quality of training data
Allow users to save their corrected headlines and summaries
Add ability for users to update topic spans
Implement data flywheel
Implement chapter marker and quarto export features
Add authentication/authorization

Install

pip install course_copilot

Setting up your development environment

Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website

Step 1: Create conda environment

After cloning the repo, create a conda environment. This will install nbdev alongside other libraries likely required for this project.

mamba env create -f environment.yml

Step 2: Install Quarto:

nbdev_install_quarto

Step 3: Install hooks

nbdev_install_hooks

Step 4: Add pre-commit hooks (optional)

If using VSCode, you can install pre-commit hooks “to catch and fix uncleaned and unexported notebooks” before pushing to get. See the instructions in the nbdev documentation if you want to use this feature. https://nbdev.fast.ai/tutorials/pre_commit.html

Step 5: Install our library

pip install -e '.[dev]'

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
course_copilot		course_copilot
data/raw		data/raw
models		models
nbs		nbs
sweep_configs/topic_segmentation		sweep_configs/topic_segmentation
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fsdl_2022_course_project

Why

How our application is structured?