GitHub - Mihaiii/llm_steer: Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectors

3 min read Original article ↗

LLM Steer

A Python module to steer LLM responses towards a certain topic/subject and to enhance capabilities (e.g., making it provide correct responses to tricky logical puzzles more often). A practical tool for using activation engineering by adding steer vectors to different layers of a Large Language Model (LLM). It should be used along with the transformers library.

Demo

Basic usage

Install it: pip install llm_steer Then use:

from llm_steer import Steer
steered_model = Steer(model, tokenizer)

Add a steering vector on a particular layer of the model with a given coefficient and text. The coefficient can also be negative.

steered_model.add(layer_idx=20, coeff=0.4, text="logical")

Get all the applied steering vectors:

Remove all steering vectors to revert to initial model:

steered_model.reset_all()

Q / A

Q: What's the difference between llm_steer and mentioning what you want in the system prompt?

A: I see llm_steer as an enhancer. It can be used together with the system prompt.


Q: How to determine the best parameters to be used?

A: I don't have a method; it's all trial and error. I recommend starting middle layers and with a small coefficient and then slowly increase it.


Q: What models are supported?

A: I tested it on multiple architectures, including LLaMa, Mistral, Phi, StableLM. Keep in mind that llm_steer is meant to be used together with HuggingFace's transformers library, so it won't work on GGUF, for example.


Q: I applied steering vectors, but the LLM outputs gibberish. What should I do?

A: Try a lower coeff value or another layer.


Q: Can I add multiple steering vectors on the same layer? Can I add the same steering vector on multiple layers? Can I add steering vectors with negative coefficients?

A: Yes, and please do. llm_steer is built for experimenting. See the Colab for examples: https://colab.research.google.com/github/Mihaiii/llm_steer/blob/main/demo/llm_steer_demo.ipynb


Q: Can I use steer vectors to enhance role-play characteristics (e.g., personas that are more funny or cocky)?

A: Yes.


Q: Can I use negative steering vectors to force it not to say "As an AI language model"?

A: Yes.

Credits / Thanks

  • DL Explorers for his video on activation engineer which goes over an article and a colab he made. The resources mentioned in that video were the starting point of llm_steer.
  • Gary Bernhardt for his excellent Python for programmers course. I needed a course that could help me go through the basics of Python without treating me like a dev noob (like most basic level tutorials treat their audience).
  • Andrej Karpathy for his State of GPT video. I always wanted to make an open-source project, but there already was a repo for every idea I had. Not when it comes to tools for LLMs, though!