awesome-reasoning

Adding reasoning to your AI? Take these resources, they may help you on your way.

Datasets

AGI/causality/frml grammar
Deepmind Chomsky Hierarchy	Problems crafted for FSM/PDA/TM	`[1]`
automata	a neurallambda tool to gen from grammars	`[1]`
im a strange dataset	Tough for LLMs because of self-references.	`[1]`
DiagGSM8k	NL Reasoning Benchmark	`[1]`
CLadder	Causal reasoning	`[1]`
Cause-Effect Pairs	108 datasets of 2 var dynamics (not NL)	`[1]`
MNLI Entailment	sentence parsing + entailment	`[1]`

AGENT/TOOL
THUDM AgentInstruct	long form dialogs	`[1]`
WANG AgentInstruct	gpt3 synthesized instructions	`[1]`
KnowLM Tool	prompt + tool call + answer	`[1]`
Glaive Tool Usage	sys prompt says tools + prompt + answer	`[1]`
opentoolformer retrieval	prompt + tool call	`[1]`

CODE
rosetta	same program, many diff languages	`[1]`
EvoEval Tool Use	100 prompt + code + tests	`[1]`

MATH/LOGIC
gsm8k	Grade School Math 8k	`[1]`
MetaMath	one-shot math	`[1]`
MetaMathFewShot	few-shot math	`[1]`
MathPile	9B tok from filtered internet	`[1]`
LogiQA	NL multi choice, requires abstraction	`[1]`
Logic-LM	a model combining auto theorem provers and llms	`[1]`
Coq Facts	270k cog theorem prover programs	`[1]`

NATURAL LANGUAGE
UltraInteract_sft	GPT generated iterated reasoning dialogs	`[1]`
MUD videogames	(various could be training data)
Winogrande	ambiguous sentences, fill in 1 word	`[1]`
Winograd_wsc	ambiguous sentences, choose the right word	`[1]`
Contradiction	2 phrases, do they contradict	`[1]`
Recognizing Textual Entailment	2 phrases, do they entail each other	`[1]`
Textual Entailment Pool	more entailment	`[1]`
Answer Validation	2 phrases, does the answer solve question	`[1]`
Monotonicity Entailment	x is true, does y follow	`[1]`
entailment	passage, question -> T/F	`[1]`
Commonsense QA	muti choice QA	`[1]`
GLUE	several datasets	`[1]`
custom multi-hop	use wikipedia's graph of articles

TOY PROBLEMS
Big Bench Hard	23 challenges (only 6k datapoints)	`[1]`
logical entailment dataset	logic strings by deepmind	`[1]`
logical entailment dataset code	(generate it yourself)	`[1]`
FSM Game	generate strings according to grammar
Adaptive Grammar	grammar rule might change
String/Graph Rewriting		`string_rewriting.py`
LibraryOfLogic	generate NL from multiple games	`[1]`
AB-XY Game
word ladder
parser
longest cmn subseq
string reversal
wisconsin card sorting
anagram
palindrome
permutation composition

TOKEN AUGMENTED REASONING
Reasoning tokens	Self-Reasoning Tokens, teaching models to think ahead	`[1]`
Quiet-STaR	LLMs Can Teach Themselves to Think Before Speaking	`[1]`
Multi-token Prediction	Multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities	https://arxiv.org/abs/2404.19737

INDIRECT REASONING (IR)
Contrapositive and Contradiction for Automated Reasoning	use logic of contrapositives and contradictions for factual reasoning and mathematical proofs	https://arxiv.org/pdf/2402.03667

DIRECT REASONING (DR)
Graph of Thoughts (GoT)	Model the information generated by an LLM as an arbitrary graph	https://arxiv.org/abs/2308.09687
Self-Consistency	Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer	https://arxiv.org/abs/2203.11171
Chain of Thoughts	chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning	https://arxiv.org/abs/2201.11903
Chain of thoughts without prompting	CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding proces	https://arxiv.org/abs/2402.10200
Iterative Reasoning Preference Optimization	Iterated DPO, but for CoT, repeated until performance saturates on reasoning tasks	https://arxiv.org/pdf/2404.19733