This directory contains example frontiers of Representation Engineering (RepE). While some of the examples were originally provided by the authors, we encourage and welcome community contributions. If you'd like to contribute, please open a PR, and we will review and merge it promptly.
Example | Description | Code Example | Author |
---|---|---|---|
Honesty | Monitoring and controlling the honesty of a model, using RepE techniques for lie detection, hallucinations, etc. | honesty | - |
Emotions | Controlling primary emotions in LLMs, illustrating the profound impact of emotions on model behavior. | primary_emotions | - |
Fairness | Reducing bias and increasing fairness in model generations. | fairness | - |
Harmless | Jailbreaking aligned model with harmless controlled | harmless_harmful | - |
Memorization | Preventing memorized outputs during generation. | memorization | - |