Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

This repository is the official implementation of Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization.

TL;DR

We investigate how large language models utilize knowledge for reasoning to solve complex questions, based on a method that deconstructs complex questions into a hierarchical graph.

Each depth of knowledge required to answer the question represents different levels of complexity.	Some reasoning is required to answer a more complex question compared to a simpler question.

Requirements

Create a virtual environment with python>=3.9 and install the appropriate PyTorch version for your machine.

In our project, we use a node of 4 x NVIDIA A6000 40GB GPUs with CUDA version 12.3.

conda create -n myenv python=3.10
conda activate myenv
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

To install requirements:

pip install -r requirements.txt

Inference

You can experiment multiple inference modes with our dataset, DepthQA:

Single-turn:
- zero-shot: Only the target question is in the input.
- prompt-gold: Before the target question, shallower (i.e., predecessors to the target question) question and gold answer pairs are provided as context.
- prompt-pred: Before the target question, shallower question and its own predicted answer pairs are provided as context.
multi-turn: Shallower questions are provided as inputs in a multi-turn conversation, i.e., the model answers each shallower question one by one and then is presented with the target question.

Most HuggingFace AutoModelForCausalLM models can be run with src/inference/single_turn.py and src/inference/multi_turn.py, with vLLM integrated and using mixed precision.

For OpenAI models, use src/inference/single_turn_openai.py and src/inference/multi_turn_openai.py.

Example usage

To inference LLaMA 3 8B Instruct with all modes:

bash scripts/inference/llama3_8b.sh

To inference GPT-3.5 Turbo with all modes:

bash scripts/inference/gpt-3.5-turbo.sh

Evaluation

Following the LLM-as-a-Judge approach, we use gpt-4-0125-preview to score the correctness of model predictions. Specifically, we use the Batch API for faster and cheaper evaluation. Our implementation of the evaluation pipeline consists of four steps:

Creating a batch request
Check the status of the batch request
Retrieve the results of the batch request
Calculate evaluation metrics
- Average accuracy
- Forward discrepancy
- Backward discrepancy

where the first three steps are performed in src/evaluation/batch_eval_openai.py and the last step is in src/evaluation/metric_calculator.py

Example usage

To analyze each step in the evaluation pipeline of LLaMA 3 8B Instruct zero-shot predictions, refer to the example commands and printed outputs in scripts/evaluation/llama3_8b_zero-shot.sh.

To run the entire pipeline of LLaMA 3 8B Instruct prompt-gold predictions automatically:

bash scripts/evaluation/llama3_8b_prompt-gold_auto.sh

Citation

@misc{ko2024hierarchicaldeconstructionllmreasoning,
      title={Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization}, 
      author={Miyoung Ko and Sue Hyun Park and Joonsuk Park and Minjoon Seo},
      year={2024},
      eprint={2406.19502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.19502}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
DATA_LICENSE		DATA_LICENSE
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

TL;DR

Requirements

Inference

Example usage

Evaluation

Example usage

Citation

About

Licenses found

Contributors 2

Languages

License

Licenses found

kaistAI/knowledge-reasoning

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

TL;DR

Requirements

Inference

Example usage

Evaluation

Example usage

Citation

About

Resources

License

Licenses found

Stars

Watchers

Forks

Contributors 2

Languages