Overview

This repo was created as part of an NLP class during my masters program. The goal was to classify "possible delusion" in a corpus of YouTube comments in which knowledgable informants (family, friends, caregivers) described an individual with dementia.

Read the paper which is in this repo as BMIN521-Final-Report.pdf.

Installations

Initialize the conda environment using environment.yml
Install the spacy_experimental coreference model. Note, some of the steps below are already handled in step 1, but it won't hurt to rerun them...

pip install spacy_experimental
pip install chardet
pip install thinc[torch]
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl

Steps To Run

Note, this was an iterative research project. It's not possible to run start to finish simply via the cli. That said, an overview of steps is provided below:

Copy .env_template to .env and populate it with your own API key / parameters
Create a list of YouTube channels and videos to extract comments from and store them in data. The files were named, channels_en.txt and vids_to_search.txt. Note, channel id's have to be acquired via the YouTube data api. Can't just use the @username.
Run youtube_extract.py. This will create a SQLite database if it doesn't already exist.
Run sql scripts 1-2 using your favorite SQLite interface. Note, anything annotation related will fail as you haven't created that table yet.
Run through notebooks/knowledgeable_informant_narratives.ipynb
Run sql script 3.
Sql scripts 4-5. Get some data and start annotating! I kept it simple and did this in Excel, then loaded the data back into SQLite. The first batch of annotation informed the OpenAI prompting strategy and created a rubric for use in annotating a test set.
Sql script 6 controls which data to send to the OpenAI model.
Run classify.py
Run through notebooks/compute_results.ipynb

Methods

See final_report.pdf for detailed overview of methods.

Issues

Open AI

500 Response Error 2/16/2023
No text in response. I suspect this is due to predicting a stop sequence in the first character. In playground received: "The model predicted a completion that begins with a stop sequence, resulting in no output. Consider adjusting your prompt or stop sequences."

Bias in YouTube Commments

I relied on YouTube search to derive a corpus of dementia-related videos. People may have searched for the content, found a useful video, and begun internalizing vocabulary. Therefore not simulating an individual who has yet to find helpful content.
Not clear why data is not present in API which is present via browser. May be result of anti-spam filters. In other cases, channel owners moderate comment content, stays in under review, but the comment count has already incremented in database. Complexity of managing a system with billions of videos, can't eventual consistency of aggregate stats. A quora post to explain.

VideoID	Browser Count	API Count	API Coll Date
zx7gLoPMO-s	105	64	2023-02-08
zpth3xzvbjU	40	40	2023-02-08
-3V9eSYR6Cs	51	49	2023-02-11
1EGhhZdQ_ts	49	49	2023-02-11
mJk02XI_sRA	5245	3888	2023-02-11
CWnILUjkgXg	1824	1565	2023-02-08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installations

Steps To Run

Methods

Issues

Open AI

Bias in YouTube Commments

Resources

Open AI on Classification

Coreference resolution

Experimental Spacy Coreference (released Oct 2022):

Other Coreference Models

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
logs		logs
notebooks		notebooks
prompts		prompts
sql		sql
.env_template		.env_template
.gitignore		.gitignore
BMIN521-Final-Report.pdf		BMIN521-Final-Report.pdf
LICENSE		LICENSE
README.md		README.md
classify.py		classify.py
environment.yml		environment.yml
youtube_extract.py		youtube_extract.py

License

dr00b/testing_llm_in_delusion_recognition

Folders and files

Latest commit

History

Repository files navigation

Overview

Installations

Steps To Run

Methods

Issues

Open AI

Bias in YouTube Commments

Resources

Open AI on Classification

Coreference resolution

Experimental Spacy Coreference (released Oct 2022):

Other Coreference Models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages