LLM Investment Benchmark
A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.
Overview
This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:
- Create new portfolios
- List current holdings and recent context
- Update portfolios based on model decisions
The model executions and their current context can be seen here.
Automated Weekly Runs
The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.
The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.
Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.
Why?
To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.
This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.
Notes
chatgpt,deepseek, andgrokare kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.perplexityis archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.- Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.
Project Structure
cmd: Contains the main command implementationscreate: Initialize new portfolioslist: Display current holdings and contextupdate: Process investment orders and update holdingsstocks: Fetch most recent stock prices
Prompt
The most recent prompt with the clear guidelines can be see here and here.
Current Portfolio (2026-05-27)
Portfolio Value by Model
pie showData
"deepseek" : 194355
"chatgpt" : 128951
"mistral" : 100000
"qwen" : 100000
"gpt-oss" : 99841
"gemini" : 82817
"llama" : 49848
| Model | Ticket | Sum | Quantity |
|---|---|---|---|
chatgpt |
USD |
69 | 69 |
chatgpt |
AAPL |
128882 | 418 |
deepseek |
AMD |
1512 | 3 |
deepseek |
ASML |
163203 | 100 |
deepseek |
MSFT |
2912 | 7 |
deepseek |
SNPS |
26728 | 50 |
gemini |
USD |
6335 | 6335 |
gemini |
AAPL |
12950 | 42 |
gemini |
AMD |
9574 | 19 |
gemini |
ASML |
6528 | 4 |
gemini |
CRDO |
4433 | 20 |
gemini |
GFS |
5398 | 60 |
gemini |
NXPI |
9980 | 30 |
gemini |
ON |
5334 | 42 |
gemini |
QCOM |
11446 | 46 |
gemini |
TSLA |
10840 | 25 |
mistral |
USD |
100000 | 100000 |
llama |
NVDA |
49848 | 232 |
gpt-oss |
USD |
250 | 250 |
gpt-oss |
AAPL |
99591 | 323 |
qwen |
USD |
100000 | 100000 |
| Model | Total Sum | Change |
|---|---|---|
deepseek |
194355 | — |
chatgpt |
128951 | — |
mistral |
100000 | — |
qwen |
100000 | — |
gpt-oss |
99841 | — |
gemini |
82817 | — |
llama |
49848 | — |