LLM Steer
A Python module to steer LLM responses towards a certain topic/subject and to enhance capabilities (e.g., making it provide correct responses to tricky logical puzzles more often). A practical tool for using activation engineering by adding steer vectors to different layers of a Large Language Model (LLM). It should be used along with the transformers library.
Demo
- llm_steer v1: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb
- llm_steer v2: https://mihaiii-llm-steer.hf.space/
Basic usage
Install it: pip install llm_steer
Then use:
from llm_steer import Steer steered_model = Steer(model, tokenizer)
Add a steering vector on a particular layer of the model with a given coefficient and text. The coefficient can also be negative.
steered_model.add(layer_idx=20, coeff=0.4, text="logical")
Get all the applied steering vectors:
Remove all steering vectors to revert to initial model:
steered_model.reset_all()
Q / A
Q: What's the difference between llm_steer and mentioning what you want in the system prompt?
A: I see llm_steer as an enhancer. It can be used together with the system prompt.
Q: How to determine the best parameters to be used?
A: I don't have a method; it's all trial and error. I recommend starting middle layers and with a small coefficient and then slowly increase it.
Q: What models are supported?
A: I tested it on multiple architectures, including LLaMa, Mistral, Phi, StableLM. Keep in mind that llm_steer is meant to be used together with HuggingFace's transformers library, so it won't work on GGUF, for example.
Q: I applied steering vectors, but the LLM outputs gibberish. What should I do?
A: Try a lower coeff value or another layer.
Q: Can I add multiple steering vectors on the same layer? Can I add the same steering vector on multiple layers? Can I add steering vectors with negative coefficients?
A: Yes, and please do. llm_steer is built for experimenting. See the Colab for examples: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb
Q: Can I use steer vectors to enhance role-play characteristics (e.g., personas that are more funny or cocky)?
A: Yes.
Q: Can I use negative steering vectors to force it not to say "As an AI language model"?
A: Yes.
Credits / Thanks
- DL Explorers for his video on activation engineer which goes over an article and a colab he made. The resources mentioned in that video were the starting point of llm_steer.
- Gary Bernhardt for his excellent Python for programmers course. I needed a course that could help me go through the basics of Python without treating me like a dev noob (like most basic level tutorials treat their audience).
- Andrej Karpathy for his State of GPT video. I always wanted to make an open-source project, but there already was a repo for every idea I had. Not when it comes to tools for LLMs, though!