GitHub - gqgs/llm100kbench: LLM 100k portfolio management benchmark

LLM Investment Benchmark

A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.

Overview

This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:

Create new portfolios
List current holdings and recent context
Update portfolios based on model decisions

The model executions and their current context can be seen here.

Automated Weekly Runs

The active model roster is configured in models.json. The weekly GitHub Actions workflow runs the benchmark every Monday, writes model orders under orders/<model>/<date>.json, stores the market snapshot in prices/<date>.csv, updates llm100kbench.db, and regenerates this README's current portfolio section.

The project is intentionally limited to free API tiers. Models that require paid metered API access or subscriptions are archived instead of being run automatically.

Each weekly run also writes a concise decision log under logs/<model>/<date>.md, including the model used, validation status, per-trade rationale, context, and validation notes when a model response is rejected.

Why?

To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.

This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.

Notes

chatgpt, deepseek, and grok are kept as continuing benchmark identities. Their exact backend model IDs are recorded in each new order's metadata.
perplexity is archived for future runs because its API is paid and the free chat UI is not suitable for unattended automation.
Claude and other paid-only APIs are not included while the project keeps the free-tier-only restriction.

Project Structure

cmd: Contains the main command implementations
- create: Initialize new portfolios
- list: Display current holdings and context
- update: Process investment orders and update holdings
- stocks: Fetch most recent stock prices

Prompt

The most recent prompt with the clear guidelines can be see here and here.

Current Portfolio (2026-05-27)

Portfolio Value by Model

pie showData
    "deepseek" : 194355
    "chatgpt" : 128951
    "mistral" : 100000
    "qwen" : 100000
    "gpt-oss" : 99841
    "gemini" : 82817
    "llama" : 49848

Model	Ticket	Sum	Quantity
`chatgpt`	`USD`	69	69
`chatgpt`	`AAPL`	128882	418
`deepseek`	`AMD`	1512	3
`deepseek`	`ASML`	163203	100
`deepseek`	`MSFT`	2912	7
`deepseek`	`SNPS`	26728	50
`gemini`	`USD`	6335	6335
`gemini`	`AAPL`	12950	42
`gemini`	`AMD`	9574	19
`gemini`	`ASML`	6528	4
`gemini`	`CRDO`	4433	20
`gemini`	`GFS`	5398	60
`gemini`	`NXPI`	9980	30
`gemini`	`ON`	5334	42
`gemini`	`QCOM`	11446	46
`gemini`	`TSLA`	10840	25
`mistral`	`USD`	100000	100000
`llama`	`NVDA`	49848	232
`gpt-oss`	`USD`	250	250
`gpt-oss`	`AAPL`	99591	323
`qwen`	`USD`	100000	100000

Model	Total Sum	Change
`deepseek`	194355	—
`chatgpt`	128951	—
`mistral`	100000	—
`qwen`	100000	—
`gpt-oss`	99841	—
`gemini`	82817	—
`llama`	49848	—