Ask HN: Where to learn the cutting edge of prompt engineering?
I’m doing experiments with LLMs and I’m trying to research techniques for grounding. Example prompt templates, for instance. There’s lots of generic articles about grounding, but details and specific examples are thin on the ground. I’ve read the source for langchain to find the prompt template for agent based reasoning, but that was just one perspective…are there better ways? Please stop trying to academize and intellectualize and nerdify what is simply questions/conversation. Prompt engineering is a forced meme, that's all it is. Do customer support for OpenAI, lol. https://community.openai.com/ Answer enough questions, stay active enough, and you'll see the same patterns emerge. You'll probably make a lot of mistakes. You'll be corrected by other regulars and people you try to help will send you angry messages saying your prompt didn't work when utilised in the industry. It's a good way to learn. As a little bonus, if you do it constantly enough, OpenAI will give you this little "Regular" rank with a secret forum and such. Langchain feels a little outdated IMO. I feel like OpenAI's in built tools might be a little ahead of it. It was originally designed to handle memory on the old completion API, but since OpenAI's chat API was released, it's not as useful. There's still good reason to use their completion models though - it performs higher quality responses for some creative uses. Agents built on them don't seem very impressive and OpenAI has their own "assistants" for agent-like stuff: https://platform.openai.com/docs/assistants/how-it-works > Langchain feels a little outdated IMO. That's being too generous lol My opinion is if you want to find out what works best is to come up with a bunch of different variations in a context-free environment to not influence prior results, determine some metrics you are targeting, and start prompting away. Then you will find the answer that works for you, and probably well more thought out than 3/4 of the articles you will find regarding this sort of thing. Prompt Engineering is clearly "a thing" irrespective of whether or not one trains or build models. LLMs clearly have a wide range of possible outputs given a particular prompt (even with just tuning temperature, top_p, top_k) but then, modification of a prompt can lead to significant improvements in the output. it's not a science. It's not really an art either. Certain prompts lead to better outputs than other prompts, and having a systematic way to characterize these differences is going to be important going forward. I personally stay abreast of new models coming out and run an evals set against new models to assess their performance vs other models (say, gpt-2, gpt-3.5-turbo, etc, gpt-4.) In terms of grounding, there is RAG, which can be built in any number of ways (PG+pg_vector, vector store, graph db). I would look at arxiv.org publicatons to stay on top of SOTA prompting stuff, as well as adjacent publications (LLMs, scaling, other things) What kind of eval set do you use? homegrown and full of love, like a carefully pruned garden of bonsai trees Here's a great article that links to a lot of research: https://lilianweng.github.io/posts/2023-03-15-prompt-enginee... > March 15, 2023 Is this obsolete? Does it contain the cutting edge prompt engineering techniques such as saying you'll tip 200$ for a correct answer? This is something that came out last week:
https://open.substack.com/pub/aitidbits/p/advanced-prompting Annoying that is for subs only but If nothing else the graphic representation is good. Is there a "cutting edge"? The space seems pretty pseudo-sciency I'm reading some papers on arxiv right now, and trying to implement them in our codebase at work. Those papers usually involve doing some common sense thing and measuring the results. Anyone could have come up with it, but they did the data science and showed some evidence it worked. If there is a better way, I would love to know lol two cents: any situation involving billions/trillions of variables looks pretty pseudo-sciency because you can't reduce it down or isolate components very well. People can do studies, add things and take things out, and sort of hint at things and explain things sort of. It is what it is. Real science is reproducible and provides testable, falsifiable hypotheses. In practice, some models (ChatGPT in particular) are not deterministic. This makes reproducing things harder. Not impossible, but harder. I'd expect science in this field to look more like economics than physics. You're probably not looking at lab results or anything, but at experiments already done "on the field" with controlled variables. Like an economist assessing minimum wage laws might compare employment in two neighbouring states over a period of time. Certainly not physics - but shouldn't it be more like biology than economics? You can do experiments in biology (and with LLMs), the problem is you can't quite get to a cause-effect. The situation is just so complex. In both cases we are desperate to get an "explanation" that fits in a human brain and makes us feel like we understand, but it's out of reach. Yeah, biology does seem like a better comparison, especially with "strands" of AI performing differently than what they used to after patches. It's kind of dream, is hard to to deterministically where the path is.
- https://arxiv.org/
- https://www.microsoft.com/en-us/research/group/dynamics-insights-apps-artificial-intelligence-machine-learning/articles/prompt-engineering-improving-our-ability-to-communicate-with-an-llm/
- https://cloud.google.com/blog/products/ai-machine-learning/how-to-use-grounding-for-your-llms-with-text-embeddings
- https://amatriain.net/blog/hallucinations
and general resources:
- https://learnprompting.org
- https://www.promptingguide.ai
- https://github.com/dair-ai/Prompt-Engineering-Guide