Date |
Paper & Summary |
Tags |
Links |
2024-08-01 |
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach |
|
|
• Introduces a cost-effective model editing approach focusing on attention heads to enhance semantic consistency in LLMs without extensive parameter changes.
• Analyzed attention heads, injected biases, and tested on NLU and NLG datasets.
• Achieved notable improvements in semantic consistency and task performance, with strong generalization across additional tasks.
|
2024-07-31 |
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment |
|
|
• Introduced Negative Attention Score (NAS) to quantify and correct negative bias in language models.
• Identified negatively biased attention heads and proposed Negative Attention Score Alignment (NASA) for fine-tuning.
• NASA effectively reduced the precision-recall gap while preserving generalization in binary decision tasks.
|
2024-07-29 |
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability |
|
|
• Introduces a method using Mechanistic Interpretability (MI) to detect and understand vulnerabilities in LLMs, particularly adversarial attacks.
• Analyzes GPT-2 Small for vulnerabilities in predicting 3-letter acronyms.
• Successfully identifies and explains specific vulnerabilities in the model related to the task.
|
2024-07-22 |
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads |
|
|
• Introduced RazorAttention, a training-free KV cache compression technique using retrieval heads and compensation tokens to preserve critical token information.
• Evaluated RazorAttention on large language models (LLMs) for efficiency.
• Achieved over 70% KV cache size reduction with no noticeable performance impact.
|
2024-07-21 |
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions |
|
|
• The paper introduces vocabulary projection and activation patching to localize hidden states that predict the correct MCQA answers.
• Identified key attention heads and layers responsible for answer selection in transformers.
• Middle-layer attention heads are crucial for accurate answer prediction, with a sparse set of heads playing unique roles.
|
2024-07-09 |
Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning |
|
|
• The article identifies induction heads as crucial for pattern matching in in-context learning (ICL).
• Evaluated Llama-3-8B and InternLM2-20B on abstract pattern recognition and NLP tasks.
• Ablating induction heads reduces ICL performance by up to ~32%, bringing it close to random for pattern recognition.
|
2024-07-01 |
Steering Large Language Models for Cross-lingual Information Retrieval |
|
|
• Introduces Activation Steered Multilingual Retrieval (ASMR), using steering activations to guide LLMs for improved cross-lingual information retrieval.
• Identified attention heads in LLMs affecting accuracy and language coherence, and applied steering activations.
• ASMR achieved state-of-the-art performance on CLIR benchmarks like XOR-TyDi QA and MKQA.
|
2024-06-21 |
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression |
|
|
• The paper introduces Mixture of Attention (MoA), which tailors distinct sparse attention configurations for different heads and layers, optimizing memory, throughput, and accuracy-latency trade-offs.
• MoA profiles models, explores attention configurations, and improves LLM compression.
• MoA increases effective context length by 3.9×, while reducing GPU memory usage by 1.2-1.4×.
|
2024-06-19 |
On the Difficulty of Faithful Chain-of-Thought Reasoning in Large Language Models |
|
|
• Introduced novel strategies for in-context learning, fine-tuning, and activation editing to improve Chain-of-Thought (CoT) reasoning faithfulness in LLMs.
• Tested these strategies across multiple benchmarks to evaluate their effectiveness.
• Found only limited success in enhancing CoT faithfulness, highlighting the challenge in achieving truly faithful reasoning in LLMs.
|
2024-05-28 |
Knowledge Circuits in Pretrained Transformers |
|
|
• Introduced "knowledge circuits" in transformers, revealing how specific knowledge is encoded through interaction among attention heads, relation heads, and MLPs.
• Analyzed GPT-2 and TinyLLAMA to identify knowledge circuits; evaluated knowledge editing techniques.
• Demonstrated how knowledge circuits contribute to model behaviors like hallucinations and in-context learning.
|
2024-05-23 |
Linking In-context Learning in Transformers to Human Episodic Memory |
|
|
• Links in-context learning in Transformer models to human episodic memory, highlighting similarities between induction heads and the contextual maintenance and retrieval (CMR) model.
• Analysis of Transformer-based LLMs to demonstrate CMR-like behavior in attention heads.
• CMR-like heads emerge in intermediate layers, mirroring human memory biases.
|
2024-05-07 |
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability |
|
|
• First mechanistic interpretability study on GPT-2 for predicting multi-token acronyms using attention heads.
• Identified and interpreted a circuit of 8 attention heads responsible for acronym prediction.
• Demonstrated that these 8 heads (~5% of total) concentrate the acronym prediction functionality.
|
2024-05-02 |
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation |
|
|
• Introduced an optogenetics-inspired causal framework to study induction head (IH) formation in transformers.
• Analyzed IH emergence in transformers using synthetic data and identified three underlying subcircuits responsible for IH formation.
• Discovered that these subcircuits interact to drive IH formation, coinciding with a phase change in model loss.
|
2024-04-24 |
Retrieval Head Mechanistically Explains Long-Context Factuality |
|
|
• Identified "retrieval heads" in transformer models responsible for retrieving information across long contexts.
• Systematic investigation of retrieval heads across various models, including analysis of their role in chain-of-thought reasoning.
• Pruning retrieval heads leads to hallucination, while pruning non-retrieval heads doesn't affect retrieval ability.
|
2024-03-27 |
Non-Linear Inference Time Intervention: Improving LLM Truthfulness |
|
|
• Introduced Non-Linear Inference Time Intervention (NL-ITI), enhancing LLM truthfulness by multi-token probing and intervention without fine-tuning.
• Evaluated NL-ITI on multiple-choice datasets, including TruthfulQA.
• Achieved a 16% relative improvement in MC1 accuracy on TruthfulQA over baseline ITI.
|
2024-02-28 |
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models |
|
|
• Introduces the PH3 method to prune conflicting attention heads, mitigating knowledge conflicts in language models without parameter updates.
• Applied PH3 to control LMs' reliance on internal memory vs. external context and tested its effectiveness on open-domain QA tasks.
• PH3 improved internal memory usage by 44.0% and external context usage by 38.5%.
|
2024-02-27 |
Information Flow Routes: Automatically Interpreting Language Models at Scale |
|
|
• Introduces "Information Flow Routes" using attribution for graph-based interpretation of language models, avoiding activation patching.
• Experiments with Llama 2, identifying key attention heads and behavior patterns across different domains and tasks.
• Uncovered specialized model components; identified consistent roles for attention heads, such as handling tokens of the same part of speech.
|
2024-02-20 |
Identifying Semantic Induction Heads to Understand In-Context Learning |
|
|
• Identifies and studies "semantic induction heads" in large language models (LLMs) that correlate with in-context learning abilities.
• Analyzed attention heads for encoding syntactic dependencies and knowledge graph relations.
• Certain attention heads enhance output logits by recalling relevant tokens, crucial for understanding in-context learning in LLMs.
|
2024-02-16 |
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains |
|
|
• Introduces a Markov Chain sequence modeling task to analyze how in-context learning (ICL) capabilities emerge in transformers, forming "statistical induction heads."
• Empirical and theoretical investigation of multi-phase training in transformers on Markov Chain tasks.
• Demonstrates phase transitions from unigram to bigram predictions, influenced by transformer layer interactions.
|
2024-02-11 |
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs |
|
|
• Identifies and explains the "additive motif" in factual recall, where LLMs use multiple independent mechanisms that constructively interfere to recall facts.
• Extended direct logit attribution to analyze attention heads and unpacked the behavior of mixed heads.
• Demonstrated that factual recall in LLMs results from the sum of multiple, independently insufficient contributions.
|
2024-02-05 |
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning |
|
|
• Introduces the concept that query and key matrices in in-context heads operate as "two towers" for metric learning, facilitating similarity computation between label features.
• Analyzed in-context learning mechanisms; identified specific attention heads crucial for ICL.
• Reduced ICL accuracy from 87.6% to 24.4% by intervening in only 1% of these heads.
|
2024-01-16 |
Circuit Component Reuse Across Tasks in Transformer Language Models |
|
|
• The paper demonstrates that specific circuits in GPT-2 can generalize across different tasks, challenging the notion that such circuits are task-specific.
• It examines the reuse of circuits from the Indirect Object Identification (IOI) task in the Colored Objects task.
• Adjusting four attention heads boosts accuracy from 49.6% to 93.7% in the Colored Objects task.
|
2024-01-16 |
Successor Heads: Recurring, Interpretable Attention Heads In The Wild |
|
|
• The paper introduces "Successor Heads," attention heads in LLMs that increment tokens with natural orderings, like days or numbers.
• It analyzes the formation of successor heads across various model sizes and architectures, such as GPT-2 and Llama-2.
• Successor heads are found in models ranging from 31M to 12B parameters, revealing abstract, recurring numeric representations.
|
2024-01-16 |
Function Vectors in Large Language Models |
|
|
• The article introduces "Function Vectors (FVs)," compact, causal representations of tasks within autoregressive transformer models.
• FVs were tested across diverse in-context learning (ICL) tasks, models, and layers.
• FVs can be summed to create vectors that trigger new, complex tasks, demonstrating internal vector composition.
|