TSCE Demo 🧠⚡
Why TSCE? In many real-world tasks, LLMs either hallucinate or lose track of complex instructions when forced to answer in one shot. Two-Step Contextual Enrichment solves this by first producing an "Embedding Space Control Prompt", then guiding a second, focused generation—delivering more faithful answers with no extra training. A two-phase mechanistic framework for more reliable LLM answers — validated on OpenAI GPT-3.5/4 and open-weights Llama-3 8 B
Table of Contents
- What is TSCE?
- Repo Highlights
- Prerequisites
- Installation
- Configuration
- Quick Start
- Usage Examples
- How TSCE Works
- Benchmarks & Latest Results
- Troubleshooting
- Extending the Demo
- Contributing
- License
What is TSCE?
Intuition: Imagine you ask a model, “Summarize this 1,000-word legal brief.” In a single pass it might drop key clauses or veer off into a hallucination because it's sampling from a wide distribution of possible vectors. Instead, TSCE’s first pass compresses the potential vector space with an "Embedding Space Control Prompt", and then second pass is better primed to generate the summary.
| Phase | Purpose | Temp | Output |
|---|---|---|---|
| 1 — Embedding Space Control Prompt | Compresses the entire prompt into a dense latent scaffold (ESCP). | ↑ ≈ 1.0 | opaque token block |
| 2 — Focused Generation | Re-reads System + User + ESCP and answers inside a narrower semantic manifold. | ↓ ≤ 0.1 | final answer |
Outcome: fewer hallucinations, instruction slips, and formatting errors — with no fine-tuning and only one extra call.
Repo Highlights
| File | Purpose |
|---|---|
tsce_agent_demo/ |
Harness & task sets that produced the results below. |
tsce_agent_demo/tsce_agent_test.py |
Baseline vs TSCE, prints both answers, writes report.json. |
tsce_agent_demo/tsce_chat.py |
Main TSCE wheel |
tsce_agent_demo/results/ |
Entropy, KL, cosine-violin plots ready to share. |
.env.example |
Copy → .env, add your keys. |
prompts/phase1.txt, prompts/phase2.txt |
Default templates for each phase |
Works with OpenAI Cloud, Azure OpenAI, or any Ollama / vLLM endpoint. ✨ New: we now load the Phase 1 and Phase 2 prompts from prompts/phase1.txt and prompts/phase2.txt, making it easy to swap in your own prompt templates.
How TSCE Works
- Phase 1 – Embedding Space Control Prompt (ESCP) Construction: compresses embedding space and generates an embedding space control prompt based on the user's input.
- Phase 2 – Guided Answering: reads the control prompt with your original prompt to craft the final response.
Trade-off Considerations
Compressing natural language always risks dropping nuance, but our benchmarks show that on multi-step reasoning tasks TSCE still gains +30 pp on GPT-3.5 and yields 76 % success on Llama-3 vs. 69 % baseline—so the escp’s focus outweighs the compression loss.
Benchmarks & Latest Results (Trends hold across more than 10,000 prompt/response pairs)
| Model | Task Suite | One-Shot | TSCE | Token × |
|---|---|---|---|---|
| GPT-3.5-turbo | math ∙ calendar ∙ format | 49 % | 79 % | 1.9× |
| GPT-4.1 | em-dash & policy tests | 50 % viol. | 6 % viol. | 2.0× |
| Llama-3 8B | mixed reasoning pack | 69 % | 76 % | 1.4× |
ESCP alone lifts GPT-3.5 by +30 pp; on the smaller Llama, the embedding space control prompt unlocks CoT (+16 pp).
Note: TSCE uses two passes, so raw joules/token cost ≈2× single-shot; we compare against a zero-temp, single-shot oracle.
Key plots (see figures/):
entropy_bar.png— 6× entropy collapsekl_per_position.png— KL > 10 nats after token 20cosine_violin.png— answers cluster tighter with an embedding space control prompt
Prerequisites
- Python 3.8 +
- OpenAI API key or Azure OpenAI deployment key
- Git (virtualenv optional)
Installation
git clone https://github.com/<your-username>/tsce_demo.git cd tsce_demo cd tsce_agent_demo python -m venv venv && source venv/bin/activate pip install -r requirements.txt cp .env.example .env # then edit .env with your creds --- ### Configuration <a name="configuration"></a> #### OpenAI Cloud ```env OPENAI_API_KEY=sk-******************************** # optional OPENAI_ENDPOINT=https://api.openai.com/v1/chat/completions MODEL_NAME=gpt-3.5-turbo
Azure OpenAI
OPENAI_API_TYPE=azure AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com AZURE_OPENAI_DEPLOYMENT=gpt-4o # your deployment name AZURE_OPENAI_API_VERSION=2024-02-15-preview AZURE_OPENAI_KEY=<azure-key> # or reuse OPENAI_API_KEY
Leave unused keys blank.
Quick Start
python tsce_agent_test.py
Sample output:
==========
>>> ENTERING EMBEDDING ANALYTICS
>>> EMBEDDINGS DONE, choosing scatter solver
>>> running t-SNE with {'method': 'barnes_hut', 'perplexity': 30, 'n_iter': 1000}
...
For an interactive UI that lets you compare the baseline and TSCE answers, run:
streamlit run streamlit_chat.py
Troubleshooting
| Symptom | Fix |
|---|---|
401 Unauthorized |
Wrong or expired key; ensure the key matches the endpoint type. |
| Hangs > 2 min | Slow model; tweak timeout in _chat() or lower temperature. |
ValueError: model not found |
Set MODEL_NAME (OpenAI) or AZURE_OPENAI_DEPLOYMENT (Azure) correctly. |
Extending the Demo
- Batch runner — loop over a prompt list, save aggregate CSV.
- Visualization — embed t‑SNE plot code from the white‑paper (convex hulls, arrows).
- Guard‑rails — add a self‑critique third pass for high‑risk domains.
- Streamlit UI — drop‑in interactive playground (ask → escp → answer).
Open Questions & Next Steps
- Recursive ESCP? Does running Phase 1 on its own escp improve or compound errors?
- Automated Prompt Tuning: Explore integrating dspy for auto-optimizing your prompt templates.
- Benchmark Strategy: We welcome new task sets—suggest yours under benchmark/tasks/.
Pull requests welcome!
Contributing
- Fork the repo
git checkout -b feature/my-feature- Commit & push
- Open a PR
Please keep new code under MIT license and add a line to README.md if you extend functionality.
License
This project is licensed under the MIT License — free for commercial or private use. See LICENSE for full text.