KCQRL: Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing

This is the repository of KCQRL: Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing. [link to our paper]

Note: If you find our work valuable or use the English translation and/or annotations of the XES3G5M dataset, we kindly ask you to consider citing our work.

@article{ozyurt2024automated,
  title={Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing},
  author={Ozyurt, Yilmazcan and Feuerriegel, Stefan and Sachan, Mrinmaya},
  journal={arXiv preprint arXiv:2410.01727},
  year={2024}
}

Our KCQRL framework consistently improves the performance of state-of-the-art KT models by a clear margin. For this, we developed our framework in 3 modules:

KC Annotation: We develop a novel, automated KC annotation approach using large language models (LLMs) that both generates solutions to the questions and labels KCs for each solution step. Thereby, we effectively circumvent the need for manual annotation from domain experts.
Representation Learning of Questions: We propose a novel contrastive learning paradigm to jointly learn representations of question content, solution steps, and KCs. As a result, our KCQRL effectively leverages the semantics of question content and KCs, as a clear improvement over existing KT models.
Improving KT Models: We integrate the learned representations into KT models to improve their performance. Our framework is flexible and can be combined with any state-of-the-art KT model for improved results.

You can find our main result below.

Improvement in the performance of KT models from our framework. Shown: AUC with std. dev. across 5 folds. Improvements are shown as both absolute and relative (%) values.

Model	XES3G5M				Eedi
	Default	w/ KCQRL (ours)	Imp. (abs.)	Imp. (%)	Default	w/ KCQRL (ours)	Imp. (abs.)	Imp. (%)
DKT	78.33 ± 0.06	82.13 ± 0.02	+3.80	+4.85%	73.59 ± 0.01	74.97 ± 0.03	+1.38	+1.88%
DKT+	78.57 ± 0.05	82.34 ± 0.04	+3.77	+4.80%	73.79 ± 0.03	75.32 ± 0.04	+1.53	+2.07%
KQN	77.81 ± 0.03	82.10 ± 0.06	+4.29	+5.51%	73.13 ± 0.01	75.16 ± 0.04	+2.03	+2.78%
qDKT	81.94 ± 0.05	82.13 ± 0.05	+0.19	+0.23%	74.09 ± 0.03	74.97 ± 0.04	+0.88	+1.19%
IEKT	82.24 ± 0.07	82.82 ± 0.06	+0.58	+0.71%	75.12 ± 0.02	75.56 ± 0.02	+0.44	+0.59%
AT-DKT	78.36 ± 0.06	82.36 ± 0.07	+4.00	+5.10%	73.72 ± 0.04	75.25 ± 0.02	+1.53	+2.08%
QIKT	82.07 ± 0.04	82.62 ± 0.05	+0.55	+0.67%	75.15 ± 0.04	75.74 ± 0.02	+0.59	+0.79%
DKVMN	77.88 ± 0.04	82.64 ± 0.02	+4.76	+6.11%	72.74 ± 0.05	75.51 ± 0.02	+2.77	+3.81%
DeepIRT	77.81 ± 0.06	82.56 ± 0.02	+4.75	+6.10%	72.61 ± 0.02	75.18 ± 0.05	+2.57	+3.54%
ATKT	79.78 ± 0.07	82.37 ± 0.04	+2.59	+3.25%	72.17 ± 0.03	75.28 ± 0.04	+3.11	+4.31%
SAKT	75.90 ± 0.05	81.64 ± 0.03	+5.74	+7.56%	71.60 ± 0.03	74.77 ± 0.02	+3.17	+4.43%
SAINT	79.65 ± 0.02	81.50 ± 0.07	+1.85	+2.32%	73.96 ± 0.02	75.20 ± 0.04	+1.24	+1.68%
AKT	81.67 ± 0.03	83.04 ± 0.05	+1.37	+1.68%	74.27 ± 0.03	75.49 ± 0.03	+1.22	+1.64%
simpleKT	81.05 ± 0.06	82.92 ± 0.04	+1.87	+2.31%	73.90 ± 0.04	75.46 ± 0.02	+1.56	+2.11%
sparseKT	79.65 ± 0.11	82.95 ± 0.09	+3.30	+4.14%	74.98 ± 0.09	78.96 ± 0.08	+3.98	+5.31%
Best values are in bold.

Setup

Dataset details: We used XES3G5M (we translated from Chinese to English) and Eedi datasets for our work.

The details of XES3G5M can be found here. You can download the dataset by following instructions there. After the download, You can add the files from data/XES3G5M/metadata to run our framework.
Eedi dataset can be acquired upon request. After acquired, you can create a new folder data/Eedi/ and move your files there. Then, you can run the preprocessing code we provide, python data_preprocess.py --dataset_name=eedi inside the directory pykt-toolkit.

Important note: For XES3G5M, we already provide its English translation, entire output from our KC annotation, and the clustering of KCs here.

Therefore, after downloading XES3G5M dataset from its source (for exercise histories), you can directly start from our Representation Learning of Questions and quickly improve your existing KT model!

Python environment: We used Python 3.11.6 in our implementation. We use two separate virtual environments in our framework.

Install the libraries via pip install -r requirements_env_rl.txt for KC Annotation and Representation Learning
Install the libraries via pip install -r requirements_env_pykt.txt for Improving KT models. After loading libraries, locate pykt-toolkit and run the command pip install -e . to install our custom version of pykt with improved kt implementations.

1) KC Annotation via LLMs

This part shows an example usage of full KC annotation pipeline. To run the scripts, first locate kc_annotation folder

We use the English translation of XES3G5M dataset questions_translated.json as our running example.

a) Solution step generation

You can run the command below

python get_step_by_step_solutions.py --original_question_file ../data/XES3G5M/metadata/questions_translated.json --annotated_question_file ../data/XES3G5M/metadata/questions_translated_kc_annotated.json

b) KC annotation

You can run the command below

python get_kc_annotation.py --original_question_file ../data/XES3G5M/metadata/questions_translated_kc_annotated.json --annotated_question_file ../data/XES3G5M/metadata/questions_translated_kc_sol_annotated.json

c) Solution Step - KC mapping

You can run the command below

python get_mapping_kc_solsteps.py --original_question_file ../data/XES3G5M/metadata/questions_translated_kc_sol_annotated.json --mapped_question_file ../data/XES3G5M/metadata/questions_translated_kc_sol_annotated_mapped.json

Note: For convenience, we provide the final output of this pipeline questions_translated_kc_sol_annotated_mapped.json.

2) Representation Learning of Questions

For this part, please locate representation_learning folder.

For training, you can run the command below:

python train.py --json_file_dataset ../data/XES3G5M/metadata/questions_translated_kc_sol_annotated_mapped.json --json_file_cluster_kc data/XES3G5M/metadata/kc_clusters_hdbscan.json --json_file_kc_questions data/XES3G5M/metadata/kc_questions_map.json --wandb_project_name <your_wandb_project_name>

Note that the above command requires you to setup your wandb account first.

After training, you can save the embeddings by following save_embeddings.ipynb.

3) Improving KT Models

We implemented the improved versions of KT models via pykt library. We forked the library to pykt-toolkit and developed the models there. Specifically, our implemented KT models can be found in models folder.

As the naming convention, we added Que suffix to the existing models, where "que" refers to our "learned question representations". For instance, the improved version of SimpleKT is implemented as SimpleKTQue and can be found in simplekt_que.py.

For training the these models, you can locate train_test folder. You can train SimpleKTQue with the command below:

python sparsekt_que_train.py --emb_path <embeddings_from_representation_learning>

Note that the above command requires you to setup your wandb account first.

We use wandb_eval.py and wandb_predict.py from pykt library for evaluation. The details of the library can be found in their documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KCQRL: Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing

Setup

1) KC Annotation via LLMs

a) Solution step generation

b) KC annotation

c) Solution Step - KC mapping

2) Representation Learning of Questions

3) Improving KT Models

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data/XES3G5M		data/XES3G5M
kc_annotation		kc_annotation
pykt-toolkit		pykt-toolkit
representation_learning		representation_learning
Framework.png		Framework.png
README.md		README.md
requirements_env_pykt.txt		requirements_env_pykt.txt
requirements_env_rl.txt		requirements_env_rl.txt

oezyurty/KCQRL

Folders and files

Latest commit

History

Repository files navigation

KCQRL: Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing

Setup

1) KC Annotation via LLMs

a) Solution step generation

b) KC annotation

c) Solution Step - KC mapping

2) Representation Learning of Questions

3) Improving KT Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages