Skip to content

[EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization"

License

Apache-2.0, CC-BY-4.0 licenses found

Licenses found

Apache-2.0
LICENSE
CC-BY-4.0
DATA_LICENSE
Notifications You must be signed in to change notification settings

kaistAI/knowledge-reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

This repository is the official implementation of Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization.

TL;DR

We investigate how large language models utilize knowledge for reasoning to solve complex questions, based on a method that deconstructs complex questions into a hierarchical graph.

Each depth of knowledge required to answer the question represents different levels of complexity. Some reasoning is required to answer a more complex question compared to a simpler question.
teaser discrepancies

Requirements

Create a virtual environment with python>=3.9 and install the appropriate PyTorch version for your machine.

In our project, we use a node of 4 x NVIDIA A6000 40GB GPUs with CUDA version 12.3.

conda create -n myenv python=3.10
conda activate myenv
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

To install requirements:

pip install -r requirements.txt

Inference

You can experiment multiple inference modes with our dataset, DepthQA:

  • Single-turn:
    • zero-shot: Only the target question is in the input.
    • prompt-gold: Before the target question, shallower (i.e., predecessors to the target question) question and gold answer pairs are provided as context.
    • prompt-pred: Before the target question, shallower question and its own predicted answer pairs are provided as context.
  • multi-turn: Shallower questions are provided as inputs in a multi-turn conversation, i.e., the model answers each shallower question one by one and then is presented with the target question.

Most HuggingFace AutoModelForCausalLM models can be run with src/inference/single_turn.py and src/inference/multi_turn.py, with vLLM integrated and using mixed precision.

For OpenAI models, use src/inference/single_turn_openai.py and src/inference/multi_turn_openai.py.

Example usage

To inference LLaMA 3 8B Instruct with all modes:

bash scripts/inference/llama3_8b.sh

To inference GPT-3.5 Turbo with all modes:

bash scripts/inference/gpt-3.5-turbo.sh

Evaluation

Following the LLM-as-a-Judge approach, we use gpt-4-0125-preview to score the correctness of model predictions. Specifically, we use the Batch API for faster and cheaper evaluation. Our implementation of the evaluation pipeline consists of four steps:

  1. Creating a batch request
  2. Check the status of the batch request
  3. Retrieve the results of the batch request
  4. Calculate evaluation metrics
    • Average accuracy
    • Forward discrepancy
    • Backward discrepancy

where the first three steps are performed in src/evaluation/batch_eval_openai.py and the last step is in src/evaluation/metric_calculator.py

Example usage

To analyze each step in the evaluation pipeline of LLaMA 3 8B Instruct zero-shot predictions, refer to the example commands and printed outputs in scripts/evaluation/llama3_8b_zero-shot.sh.

To run the entire pipeline of LLaMA 3 8B Instruct prompt-gold predictions automatically:

bash scripts/evaluation/llama3_8b_prompt-gold_auto.sh

Citation

@misc{ko2024hierarchicaldeconstructionllmreasoning,
      title={Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization}, 
      author={Miyoung Ko and Sue Hyun Park and Joonsuk Park and Minjoon Seo},
      year={2024},
      eprint={2406.19502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.19502}, 
}

About

[EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization"

Resources

License

Apache-2.0, CC-BY-4.0 licenses found

Licenses found

Apache-2.0
LICENSE
CC-BY-4.0
DATA_LICENSE

Stars

Watchers

Forks